What are the observability challenges unique to serverless architectures?

Answer

Serverless introduces unique observability challenges: (1) Ephemeral execution — functions exist for milliseconds; traditional always-on agents (APM agents) can't attach to processes that start and stop constantly; observability must be code-embedded or lambda extension-based; (2) Distributed by default — a single user action may invoke 5-10 functions; correlating logs across functions requires consistent correlation IDs (request ID propagated in all logs) and distributed tracing (X-Ray, OpenTelemetry); (3) Cold start noise — cold start initialization time inflates p99/p999 latency metrics; separate cold start invocations in metrics for accurate performance analysis; (4) Log aggregation — Lambda logs go to separate CloudWatch Log Groups per function; querying across functions requires CloudWatch Log Insights multi-group queries or centralizing in Elasticsearch/Splunk; (5) Invocation count vs. duration tradeoff — cost is invocations × duration; observability tools that increase invocation count or duration add cost; (6) External service observability — DynamoDB, SQS, and external APIs are black boxes; X-Ray subsegments add visibility; (7) Asynchronous event flows — tracing through SQS/SNS/EventBridge requires explicit trace context propagation in message attributes.