What is etcd in Kubernetes and how to back it up?

Answer

etcd is the distributed key-value store that stores ALL Kubernetes cluster state — every pod spec, deployment, secret, ConfigMap, RBAC rule, service account. Loss of etcd data = loss of cluster state. etcd uses the Raft consensus algorithm — requires quorum (majority of members) to accept writes. A cluster of 3 members tolerates 1 failure; 5 members tolerate 2 failures. Always deploy an odd number. etcd data layout in Kubernetes: all data under /registry/ prefix. Pod specs: /registry/pods/namespace/podname. Secrets: /registry/secrets/namespace/secretname (should be encrypted at rest). Backup with etcdctl: ETCDCTL_API=3 etcdctl snapshot save snapshot.db \ --endpoints=https://127.0.0.1:2379 \ --cacert=/etc/kubernetes/pki/etcd/ca.crt \ --cert=/etc/kubernetes/pki/etcd/server.crt \ --key=/etc/kubernetes/pki/etcd/server.key etcdctl snapshot status snapshot.db. Restore: etcdctl snapshot restore snapshot.db \ --data-dir=/var/lib/etcd-restored systemctl stop etcd mv /var/lib/etcd /var/lib/etcd-backup mv /var/lib/etcd-restored /var/lib/etcd systemctl start etcd. Backup strategy: backup before cluster upgrades; regular automated backups (hourly for production) to S3/GCS. Managed K8s (EKS, GKE, AKS) handles etcd backups automatically. etcd performance: CPU and disk I/O sensitive. Use SSDs. Dedicated nodes for etcd in large clusters. Monitor: etcd_disk_wal_fsync_duration_seconds (should be <10ms). Velero: application-level backup tool — backs up Kubernetes resources AND persistent volumes. Restore to same or different cluster. Supports S3, GCS, Azure Blob storage.