What is gradient vanishing and exploding?

Answer

During backpropagation, gradients are multiplied together as they flow back through layers. In deep networks, if weights have values < 1, gradients become exponentially smaller (vanishing gradients) — early layers learn very slowly. If weights > 1, gradients explode (exploding gradients) — training becomes unstable. Solutions: better activation functions (ReLU instead of Sigmoid/Tanh), careful weight initialization (He init, Xavier init), batch normalization, residual connections (skip connections in ResNet), and gradient clipping for exploding gradients.