What is data leakage in machine learning?

Question

Accepted Answer

Data leakage occurs when information from outside the training dataset (specifically from the test set or from the future) is used to build the model, resulting in overly optimistic performance estimates that do not generalize. Common examples: fitting a scaler on the full dataset before splitting, including future information in features (look-ahead bias), or having test samples appear in the training set. Preventing leakage requires performing all preprocessing steps (scaling, imputation, enco

What is data leakage in machine learning?

Answer

More Machine Learning / AI Questions