What is batch normalization?

Q: What is batch normalization?

Batch Normalization (BatchNorm) normalizes the activations of each layer to have zero mean and unit variance, computed over the mini-batch, then scales and shifts with learnable parameters γ and β. Benefits: accelerates training by allowing higher learning rates, reduces sensitivity to weight initialization, acts as a regularizer (reduces need for dropout), and mitigates the internal covariate shift problem (distribution of layer inputs changing during training). It is placed after the linear

Answer

Batch Normalization (BatchNorm) normalizes the activations of each layer to have zero mean and unit variance, computed over the mini-batch, then scales and shifts with learnable parameters γ and β. Benefits: accelerates training by allowing higher learning rates, reduces sensitivity to weight initialization, acts as a regularizer (reduces need for dropout), and mitigates the internal covariate shift problem (distribution of layer inputs changing during training). It is placed after the linear transformation and before the activation function in practice.

What is dropout regularization?

What is gradient vanishing and exploding?

More Machine Learning / AI Questions

View all →

Intermediate What is a convolutional neural network (CNN)?
Intermediate What is a Recurrent Neural Network (RNN)?
Intermediate What is an LSTM and how does it solve the vanishing gradient problem?
Intermediate What is the attention mechanism in neural networks?
Intermediate What is the Transformer architecture?

All Machine Learning / AI Questions Browse All Topics