ML 530: Deep Learning

Welcome to the auxillary page for the Deep Learning course!

Textbook #1: Deep Learning with Python (2nd edition), by Francois Chollet. Like Francois Chollet's Keras library, the intent of this book is to help democratize deep learning via hands-on learning.

Textbook #2: Deep Learning Illustrated, by Jon Krohn, Grant Beyleveld, and Algae Bassens. This book provides a broad [not necessarily deep] overview of a range of topics, including recent developments such as the Mask R-CNN model and the transformer architecture.

Textbook #3: The Science of Deep Learning, by Iddo Drori. This will serve as an occasional reference for us.

The current course materials include ...

Introduction:

Fundamentals:

ConvNets (Part 1):

ConvNets (Part 2):

Embeddings, Recurrent Neural Networks, and Sequences (Part 1):

Embeddings, Recurrent Neural Networks, and Sequences (Part 2):

Generative Models:

Reinforcement Learning:

Slides
Variations on the DQN ...
- https://github.com/mimoralea/gdrl/blob/master/notebooks/chapter_10/chapter-10.ipynb
- class FCDuelingQ(nn.Module): # fully connected: the state value and action advantage are dueling
- q = v + a - a.mean(1, keepdim=True).expand_as(a)
- class DuelingDDQN():
- argmax_a_q_sp = self.online_model(next_states).max(1)[1] # online model selects action
- q_sp = self.target_model(next_states).detach() # target model estimates action value
- mixed_weights = target_ratio + online_ratio
- class PrioritizedReplayBuffer():
- self.memory[idxs, self.td_error_index] = np.abs(td_errors)
- sorted_arg = self.memory[:self.n_entries, self.td_error_index].argsort()[::-1] # sorted by magnitude of TD error
- if self.rank_based:
- priorities = 1/(np.arange(self.n_entries) + 1)
- else: # proportional
- priorities = entries[:, self.td_error_index] + EPS
- scaled_priorities = priorities**self.alpha
- probs = np.array(scaled_priorities/np.sum(scaled_priorities), dtype=np.float64)
- weights = (self.n_entries * probs)**-self.beta
- normalized_weights = weights/weights.max()
- What is the role of weighted importance sampling for TD error?
Vanilla Policy Gradient (aka REINFORCE with baseline) ...
- https://github.com/mimoralea/gdrl/blob/master/notebooks/chapter_11/chapter-11.ipynb
- class FCDAP(nn.Module): # fully connected discrete-action policy
- dist = torch.distributions.Categorical(logits=logits)
- action = dist.sample()
- class VPG():
- value_error = returns - self.values
- policy_loss = -(discounts * value_error.detach() * self.logpas).mean()
- entropy_loss = -self.entropies.mean()
- loss = policy_loss + self.entropy_loss_weight * entropy_loss # updates the policy model parameters
- value_loss = value_error.pow(2).mul(0.5).mean() # updates the state value model parameters
- How is the return (trajectory rewards) affecting parameter updates for the policy?

Misc:

Homework Questions:

Deep Learning Word Cloud

about me