📨 Apache Kafka Intermediate

How do you design Kafka topics for high throughput?

Answer

Kafka topic design for high throughput involves several decisions. Partition count: more partitions = higher throughput (more parallel producers/consumers). Rule of thumb: start with max(target throughput / single-partition throughput, desired consumer parallelism). A single partition typically handles 10–100 MB/s. Replication factor: 3 for production (balance durability and performance). Producer batching: increase batch.size (512KB+) and linger.ms (5–20ms) for larger batches, reducing per-message overhead. Compression: use lz4 or zstd for a good speed/compression ratio. Consumer parallelism: match consumer count to partition count. Avoid hotspots: choose partition keys that distribute load evenly — high-cardinality keys like UUIDs are safe; low-cardinality keys (country code, boolean) create hot partitions. Tiered storage: offload old data to S3/GCS to reduce broker disk pressure.