What is the concept of model interpretability and explainability?

Answer

Interpretability refers to how understandable a model's mechanics are to a human. Explainability refers to post-hoc methods to explain individual predictions. Key techniques: SHAP (SHapley Additive exPlanations) — assigns each feature a Shapley value representing its contribution to the prediction; model-agnostic and theoretically grounded. LIME (Local Interpretable Model-agnostic Explanations) — fits a simple linear model locally around a prediction. Grad-CAM — produces class activation maps for CNN predictions. Attention visualization — visualizes attention weights in Transformers. Interpretability is critical in healthcare, finance, and regulated industries.