Advanced Artificial Intelligence & Machine Learning
Q99 / 100

Mathematically, why does L2 regularization (Ridge) tend to shrink weights smoothly toward zero rather than setting them exactly to zero, unlike L1 (Lasso)?

Correct! Well done.

Incorrect.

The correct answer is B) Because the L2 penalty's gradient is proportional to the weight itself, so its pull weakens as a weight approaches zero, whereas the L1 penalty's gradient has constant magnitude and can drive small weights all the way to exactly zero, producing sparsity

B

Correct Answer

Because the L2 penalty's gradient is proportional to the weight itself, so its pull weakens as a weight approaches zero, whereas the L1 penalty's gradient has constant magnitude and can drive small weights all the way to exactly zero, producing sparsity

Explanation

The derivative of the L2 penalty term (λw²) with respect to w is 2λw — proportional to w, so the shrinkage effect diminishes near zero. The derivative of the L1 penalty (λ|w|) is a constant ±λ, which keeps pushing small weights toward zero until they reach it exactly, producing sparse solutions useful for feature selection.

Progress
99/100