What is Kubernetes resource monitoring and observability?
Answer
Kubernetes observability requires metrics, logs, and traces: Metrics: Metrics Server — lightweight metrics pipeline. Enables kubectl top and HPA. Only current metrics (no storage). Prometheus — de facto metrics collection for Kubernetes. Pulls metrics from endpoints every 15s. Long-term storage. PromQL for querying. kube-state-metrics — exposes Kubernetes object metrics (replica counts, deployment status) to Prometheus. Node Exporter — host-level metrics (CPU, memory, disk, network). Prometheus Adapter — converts Prometheus metrics to Kubernetes custom metrics API for HPA. kube-prometheus-stack (Helm chart): installs Prometheus + Alertmanager + Grafana + node-exporter + kube-state-metrics. Standard monitoring stack. Grafana dashboards: pre-built dashboards for Kubernetes (Grafana.com IDs: 3119 for cluster overview, 6417 for pod resources). Alertmanager: handles alerts from Prometheus — routes to PagerDuty, Slack, email based on severity, time, labels. Logging: containers write to stdout/stderr → kubelet captures to node filesystem. Collect with: Fluentd or Fluent Bit (DaemonSet) → forwards to Elasticsearch/OpenSearch, CloudWatch, Datadog, Loki. Loki + Grafana: Loki indexes log labels (not full text), cheap storage — query with LogQL in Grafana. Tracing: Jaeger or Zipkin deployed in cluster. Applications instrumented with OpenTelemetry SDK → send spans to Jaeger. Kubernetes events: kubectl get events --sort-by=.lastTimestamp -n namespace — critical for debugging. Events expire after 1 hour — forward to persistent storage. Resource monitoring commands: kubectl top pods --containers kubectl top nodes kubectl describe node node-1 # Allocated resources, conditions, events.