What is service discovery?
Why Interviewers Ask This
Interviewers use this question to quickly assess whether a candidate has the foundational knowledge required for System Design development. It reveals whether you understand the building blocks that more complex concepts rely on.
Answer
Service discovery is the mechanism by which services in a microservices architecture find each other. In dynamic cloud environments, service instances start, stop, and change IP addresses constantly — hardcoded IPs don't work. Two patterns: (1) Client-side discovery: client queries a service registry (Consul, etcd, ZooKeeper, Eureka) to get available instances, then load balances and calls directly. Client must know the registry and implement load balancing. Examples: Netflix Eureka with Ribbon; (2) Server-side discovery: client calls a load balancer or API gateway that queries the registry and routes to an appropriate instance. Client doesn't need to know about discovery — simpler client. Examples: AWS ALB, Kubernetes Services. Service registry: a database of service names → available instances (IP:port). Services register on startup, deregister on shutdown, and send heartbeats to indicate health. The registry removes instances that stop sending heartbeats. DNS-based discovery: Kubernetes uses DNS — each service gets a stable DNS name (my-service.namespace.svc.cluster.local); DNS resolves to the service's cluster IP which is load-balanced to pods. Simple but lacks health check detail. Health checks: registry integrates with health checks — only healthy instances are returned. Types: HTTP health check endpoint, TCP connection check, command execution. Tools: Consul (popular, multi-datacenter), AWS Cloud Map, Kubernetes Services + CoreDNS.
Common Mistake
Many candidates answer correctly but can't explain the 'why'. Always be prepared to justify your answer with a concrete example or use case from your System Design experience.
Previous
What is the difference between a monolithic and microservices architecture?
Next
What is back-pressure in distributed systems?