What is etcd in Kubernetes and how to back it up?
Answer
etcd is the distributed key-value store that stores ALL Kubernetes cluster state — every pod spec, deployment, secret, ConfigMap, RBAC rule, service account. Loss of etcd data = loss of cluster state. etcd uses the Raft consensus algorithm — requires quorum (majority of members) to accept writes. A cluster of 3 members tolerates 1 failure; 5 members tolerate 2 failures. Always deploy an odd number. etcd data layout in Kubernetes: all data under /registry/ prefix. Pod specs: /registry/pods/namespace/podname. Secrets: /registry/secrets/namespace/secretname (should be encrypted at rest). Backup with etcdctl: ETCDCTL_API=3 etcdctl snapshot save snapshot.db \ --endpoints=https://127.0.0.1:2379 \ --cacert=/etc/kubernetes/pki/etcd/ca.crt \ --cert=/etc/kubernetes/pki/etcd/server.crt \ --key=/etc/kubernetes/pki/etcd/server.key etcdctl snapshot status snapshot.db. Restore: etcdctl snapshot restore snapshot.db \ --data-dir=/var/lib/etcd-restored systemctl stop etcd mv /var/lib/etcd /var/lib/etcd-backup mv /var/lib/etcd-restored /var/lib/etcd systemctl start etcd. Backup strategy: backup before cluster upgrades; regular automated backups (hourly for production) to S3/GCS. Managed K8s (EKS, GKE, AKS) handles etcd backups automatically. etcd performance: CPU and disk I/O sensitive. Use SSDs. Dedicated nodes for etcd in large clusters. Monitor: etcd_disk_wal_fsync_duration_seconds (should be <10ms). Velero: application-level backup tool — backs up Kubernetes resources AND persistent volumes. Restore to same or different cluster. Supports S3, GCS, Azure Blob storage.
Previous
What is the Kubernetes scheduler and custom scheduling?
Next
What is Kubernetes operator pattern?