Mathematically, why does L2 regularization (Ridge) tend to shrink weights smoothly toward zero rather than setting them exactly to zero, unlike L1 (Lasso)?

Correct! Well done.

Incorrect.

The correct answer is B) Because the L2 penalty's gradient is proportional to the weight itself, so its pull weakens as a weight approaches zero, whereas the L1 penalty's gradient has constant magnitude and can drive small weights all the way to exactly zero, producing sparsity

Correct Answer

Because the L2 penalty's gradient is proportional to the weight itself, so its pull weakens as a weight approaches zero, whereas the L1 penalty's gradient has constant magnitude and can drive small weights all the way to exactly zero, producing sparsity

Explanation

The derivative of the L2 penalty term (λw²) with respect to w is 2λw — proportional to w, so the shrinkage effect diminishes near zero. The derivative of the L1 penalty (λ|w|) is a constant ±λ, which keeps pushing small weights toward zero until they reach it exactly, producing sparse solutions useful for feature selection.

Previous All Questions Next

Progress

99/100

🧠

Browse All Artificial Intelligence & Machine Learning Questions

100 questions · beginner to advanced