What is batch normalization?

Why Interviewers Ask This

This tests whether you can apply Machine Learning / AI knowledge to real-world scenarios. Interviewers are looking for clarity of thought and evidence that you've encountered this in production code.

Answer

Batch Normalization (BatchNorm) normalizes the activations of each layer to have zero mean and unit variance, computed over the mini-batch, then scales and shifts with learnable parameters γ and β. Benefits: accelerates training by allowing higher learning rates, reduces sensitivity to weight initialization, acts as a regularizer (reduces need for dropout), and mitigates the internal covariate shift problem (distribution of layer inputs changing during training). It is placed after the linear transformation and before the activation function in practice.

Common Mistake

Don't just define the term — demonstrate that you understand when to use it and when not to. Showing awareness of trade-offs is what separates average from strong Machine Learning / AI candidates.