What is Kubernetes Horizontal Pod Autoscaler (HPA)?

Q: What is Kubernetes Horizontal Pod Autoscaler (HPA)?

The Horizontal Pod Autoscaler (HPA) automatically scales the number of pod replicas in a Deployment, ReplicaSet, or StatefulSet based on observed metrics (CPU, memory, custom metrics). HPA spec: apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: my-app-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: my-app minReplicas: 2 maxReplicas: 10 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 - type: Resource resour

Answer

The Horizontal Pod Autoscaler (HPA) automatically scales the number of pod replicas in a Deployment, ReplicaSet, or StatefulSet based on observed metrics (CPU, memory, custom metrics). HPA spec: apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: my-app-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: my-app minReplicas: 2 maxReplicas: 10 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 - type: Resource resource: name: memory target: type: AverageValue averageValue: 200Mi. How it works: every 15 seconds, HPA queries the Metrics Server (or custom metrics API) for current resource usage → computes desired replica count = ceil(currentReplicas × currentMetric / targetMetric) → scales up or down within min/max bounds. Scale-up cooldown: 3 minutes (default) before scaling up again — prevents flapping; Scale-down cooldown: 5 minutes (default) before scaling down. Custom and external metrics: scale on any metric via Prometheus Adapter (HTTP requests/second, queue depth, latency). Example: type: External external: metric: name: sqs_queue_depth selector: matchLabels: queue: orders target: type: AverageValue averageValue: "5". Metrics Server: must be installed in the cluster (not default): kubectl apply -f metrics-server.yaml. KEDA (Kubernetes Event-Driven Autoscaling): extends HPA with 60+ event sources (Kafka, SQS, Redis, Azure Service Bus, Prometheus). Scale to zero when no events. VPA: scale pod resource requests vertically (more CPU/memory) — can't run simultaneously with HPA on same metric.

Answer

More Kubernetes (K8s) Questions