What is high availability (HA)?

Answer

High Availability (HA) is the ability of a system to remain operational and accessible for a very high percentage of time, minimizing downtime. Measured as a percentage of uptime: 99.9% = "three nines" (~8.7h downtime/year); 99.99% = "four nines" (~52min/year); 99.999% = "five nines" (~5min/year). HA principles: (1) Eliminate SPOFs: redundant components at every layer (web, app, database, network); (2) Automated failover: system detects failures and reroutes automatically without human intervention — health checks, heartbeats, Kubernetes pod rescheduling; (3) Graceful degradation: when a component fails, serve a reduced-functionality response rather than a total failure — "circuit breaker" pattern; (4) Geographic distribution: multi-AZ (availability zone) deployments within a region; multi-region for disaster recovery; (5) Zero-downtime deployments: rolling updates, blue-green deployments, canary releases — deploy new code without taking the system down; (6) Health checks and self-healing: Kubernetes restarts crashed pods; load balancers remove unhealthy servers. Active-Passive HA: one active server + one standby — simple, wastes standby capacity. Active-Active HA: all servers active, load balanced — no waste, full capacity. HA vs DR (Disaster Recovery): HA prevents downtime from component failures; DR recovers from catastrophic failures (datacenter loss) with acceptable RTO (Recovery Time Objective) and RPO (Recovery Point Objective).

Answer

More System Design Questions