How do you optimize cost in CI/CD infrastructure using spot instances and right-sized runners?

Answer

CI/CD infrastructure can be the largest cloud cost in an engineering organization — runner machines sit idle most of the day and must handle peak demand. Key optimization strategies: Spot/Preemptible instances: AWS Spot Instances and GCP Preemptible VMs cost 60-90% less than on-demand. CI jobs are naturally retry-able, making them excellent candidates for spot instances. If a spot instance is reclaimed, the CI system retries the job on a new instance. GitHub Actions Runner Scale Sets on Kubernetes automatically use spot node pools. Auto-scaling runner pools: use tools like actions-runner-controller (ARC) to scale from 0 to N runners based on queue depth and scale back to 0 during off-hours — paying only for what you use. Right-sizing: profile actual CPU and memory usage per job type — most jobs are I/O bound and do not need large instances; use 2-4 vCPU runners instead of defaulting to 8+. Pipeline caching: every cache hit avoids runner minutes. Conditional pipeline execution: skip expensive jobs (E2E tests, security scans) on draft PRs or non-critical branches. Artifact TTL policies: aggressively expire old build artifacts from storage to reduce S3/GCS costs.