Intermediate Artificial Intelligence & Machine Learning
Q75 / 100

What is data leakage in a machine learning workflow?

Correct! Well done.

Incorrect.

The correct answer is B) When information from outside the training dataset — often from the validation or test set, or from the future — improperly influences model training, producing unrealistically optimistic performance estimates

B

Correct Answer

When information from outside the training dataset — often from the validation or test set, or from the future — improperly influences model training, producing unrealistically optimistic performance estimates

Explanation

Data leakage happens when features or preprocessing statistics are derived using information that would not be available at prediction time (such as fitting a scaler on the full dataset before splitting). The result is a model that looks great in evaluation but performs poorly in production.

Progress
75/100