What is data leakage in a machine learning workflow?
Correct! Well done.
Incorrect.
The correct answer is B) When information from outside the training dataset — often from the validation or test set, or from the future — improperly influences model training, producing unrealistically optimistic performance estimates
Correct Answer
When information from outside the training dataset — often from the validation or test set, or from the future — improperly influences model training, producing unrealistically optimistic performance estimates
Data leakage happens when features or preprocessing statistics are derived using information that would not be available at prediction time (such as fitting a scaler on the full dataset before splitting). The result is a model that looks great in evaluation but performs poorly in production.