Chapter 6: Deep Feedforward Networks

Introduction

From linear models to neural networks

Trying to learn XOR

ReLu

Gradient-Based Learning

Formulating Cost functions

Output Units

Most times, we use the CE loss between the data distribution and the model distribution - as such, the output units we pick determines the from of the CE loss.

Assume $h = f(x, \theta)$.

Output activations

Other Output Types

Hidden Units

The Humble ReLU, and its several variations

Output Activations

Architecture Design

Alternatives to the fully connected architecture

Backpropagation