What is a pipeline in machine learning?

Answer

An ML pipeline is a sequence of data processing and modeling steps chained together to automate the end-to-end workflow. A typical pipeline includes: data ingestion → preprocessing (imputation, scaling, encoding) → feature engineering → model training → evaluation → deployment. Pipelines ensure that the same transformations applied to training data are also applied to test/inference data, preventing data leakage. Scikit-learn's Pipeline class is a standard way to bundle preprocessing and modeling steps together.