Intermediate
Artificial Intelligence & Machine Learning
Q74 / 100
Why might you use the Adam optimizer instead of plain stochastic gradient descent (SGD) when training a neural network?
Correct! Well done.
Incorrect.
The correct answer is B) Adam adapts the learning rate for each parameter using estimates of first and second moments of the gradients, often leading to faster convergence with less manual tuning than plain SGD
B
Correct Answer
Adam adapts the learning rate for each parameter using estimates of first and second moments of the gradients, often leading to faster convergence with less manual tuning than plain SGD
Explanation
Adam combines momentum (using a moving average of past gradients) with per-parameter adaptive learning rates (using a moving average of squared gradients), which often makes training more stable and faster to converge than vanilla SGD, especially on noisy or sparse gradients.
Progress
74/100