Question

Why might you use the Adam optimizer instead of plain stochastic gradient descent (SGD) when training a neural network?

Accepted Answer

B) Adam adapts the learning rate for each parameter using estimates of first and second moments of the gradients, often leading to faster convergence with less manual tuning than plain SGD

Answer

A) Adam guarantees the model will reach the global minimum

Answer

C) Adam removes the need for a validation set entirely

Answer

D) Adam only works for convolutional neural networks