What does a CI/CD pipeline for ML models look like (model versioning, data validation, shadow deployment)?
Answer
CI/CD for ML (called MLOps or CD4ML) extends software CI/CD with additional concerns unique to machine learning. Data validation: before training, validate the training dataset for schema drift, missing values, and statistical distribution shifts using tools like Great Expectations or TFX Data Validation. Model training: the pipeline triggers training on new data or code changes, tracking experiments and hyperparameters in MLflow or Weights & Biases. Model evaluation: automatically evaluate the trained model against a held-out test set and compare metrics (accuracy, F1, AUC) to the current production model — only promote if the new model is better. Model versioning: store model artifacts in a model registry (MLflow Model Registry, Vertex AI Model Registry) with metadata, provenance, and approval workflows. Shadow deployment: deploy the new model alongside production (receiving a copy of traffic via mirroring) to compare predictions and latency in real production conditions before switching traffic. Continuous monitoring: after full deployment, track prediction distribution, data drift (feature statistics diverging from training distribution), and model performance degradation, triggering automatic retraining or rollback when metrics degrade.
Previous
How do you optimize cost in CI/CD infrastructure using spot instances and right-sized runners?
Next
What is a security shift-left strategy in CI/CD and how is it implemented?
More CI/CD Pipelines Questions
View all →- Advanced What are the core principles of GitOps?
- Advanced What is progressive delivery and how does it extend beyond basic canary releases?
- Advanced How is chaos engineering integrated into CD pipelines?
- Advanced How does Terraform work in fully automated pipelines with plan PR comments and apply on merge?
- Advanced What is compliance as code and how do tools like OPA enforce it in pipelines?