Ch8: Optimization for Deep Learning

Introduction

Empirical Risk Minimization

Batch/minibatch learning

Issue w/Optimization: Ill-Conditioning

Issue w/Optimization: Saddle Points/Local Minima

Issue w/Optimization: Cliffs/Exploding Gradients

Issue w/Optimization: Long-term dependencies

Issue w/Optimization: Poor Correspondence Between Local & Global Structure

Basic Algorithms

Parameter Initialization Strategies

Adaptive Learning Rate Algorithms

Second-Order Methods

Conjugate gradients method