What is Kubernetes cluster autoscaling?
Why Interviewers Ask This
Candidates at the intermediate level are expected to not only know this concept but explain the trade-offs involved. Interviewers use this question to see if you can reason about design decisions, not just recall facts.
Answer
Kubernetes has multiple layers of autoscaling: HPA (Horizontal Pod Autoscaler): scales pod replicas within a deployment/statefulset based on metrics (CPU, memory, custom). VPA (Vertical Pod Autoscaler): adjusts container resource requests/limits automatically based on historical usage. Three modes: Off (recommendations only), Initial (apply only on pod creation), Auto (apply and evict pods for resizing). VPA + HPA on same metric causes conflicts — use VPA for CPU/memory, HPA for custom metrics. Cluster Autoscaler (CA): scales the number of nodes in the cluster. Adds nodes when pods are Pending due to insufficient resources; removes nodes when nodes are underutilized (default threshold: 50% for 10 minutes). Works with cloud provider node groups (AWS ASG, GCP MIG, Azure VMSS). Configure: helm install cluster-autoscaler autoscaler/cluster-autoscaler \ --set autoDiscovery.clusterName=my-cluster \ --set awsRegion=us-east-1 \ --set rbac.serviceAccount.annotations."eks.amazonaws.com/role-arn"=arn:aws:iam::123:role/CA. Karpenter (AWS): replacement for Cluster Autoscaler on EKS. Faster (seconds vs minutes), smarter (selects optimal instance type per workload requirements), cost-effective (automatically selects Spot or On-Demand, right instance size). Node provisioner CRD defines allowed instance types, zones, capacity type. Consolidation: actively replaces nodes with cheaper alternatives when underutilized. KEDA (Kubernetes Event Driven Autoscaling): event-driven autoscaling — scales pods to 0 when no work, scales up based on queue depth, Kafka lag, etc. Works as external metrics provider for HPA or as its own CRD (ScaledObject). 60+ scalers available.
Pro Tip
Demonstrate both theoretical understanding and practical experience. Say what it is, then give an example of how you actually used it in a Kubernetes (K8s) codebase.