What is fault tolerance in microservices?

Answer

Fault tolerance in microservices means designing services to continue operating correctly even when some of their dependencies fail. Because microservices communicate over a network, partial failures are inevitable. Key fault tolerance techniques include: Timeouts (don't wait indefinitely for a response), Retries with exponential backoff (retry transient failures but avoid overwhelming a recovering service), Circuit Breakers (stop calling a failing service entirely for a period), and Fallbacks (return cached data or a degraded response when the primary path fails). Without fault tolerance, a single slow downstream service can exhaust all threads in the calling service and cause a cascading failure across the entire system.