🤖 Machine Learning / AI
Intermediate
What is gradient vanishing and exploding?
Answer
During backpropagation, gradients are multiplied together as they flow back through layers. In deep networks, if weights have values < 1, gradients become exponentially smaller (vanishing gradients) — early layers learn very slowly. If weights > 1, gradients explode (exploding gradients) — training becomes unstable. Solutions: better activation functions (ReLU instead of Sigmoid/Tanh), careful weight initialization (He init, Xavier init), batch normalization, residual connections (skip connections in ResNet), and gradient clipping for exploding gradients.
More Machine Learning / AI Questions
View all →- Intermediate What is a convolutional neural network (CNN)?
- Intermediate What is a Recurrent Neural Network (RNN)?
- Intermediate What is an LSTM and how does it solve the vanishing gradient problem?
- Intermediate What is the attention mechanism in neural networks?
- Intermediate What is the Transformer architecture?