Intermediate Artificial Intelligence & Machine Learning
Q74 / 100

Why might you use the Adam optimizer instead of plain stochastic gradient descent (SGD) when training a neural network?

Correct! Well done.

Incorrect.

The correct answer is B) Adam adapts the learning rate for each parameter using estimates of first and second moments of the gradients, often leading to faster convergence with less manual tuning than plain SGD

B

Correct Answer

Adam adapts the learning rate for each parameter using estimates of first and second moments of the gradients, often leading to faster convergence with less manual tuning than plain SGD

Explanation

Adam combines momentum (using a moving average of past gradients) with per-parameter adaptive learning rates (using a moving average of squared gradients), which often makes training more stable and faster to converge than vanilla SGD, especially on noisy or sparse gradients.

Progress
74/100