What is Cassandra's partition key and why is it important?
Answer
The partition key is the most critical design decision in Cassandra. It determines how data is distributed across the cluster using a consistent hash: Cassandra hashes the partition key to a token and assigns the data to the node(s) responsible for that token range. Why it matters: Data distribution: a good partition key distributes data evenly across all nodes — avoiding hot partitions (one node handling all traffic). Queries: every query MUST include the partition key (or a full primary key) — Cassandra cannot efficiently query across partitions. Queries without the partition key require a full cluster scan (ALLOW FILTERING — avoid in production). Partition size: recommended max ~100MB or 100K rows per partition. Large partitions cause memory, compaction, and repair issues. Examples of good keys: user_id for user data, (user_id, year_month) for time-series per user. Bad keys: low-cardinality columns like status, country (creates hot partitions). Good partition key design is the foundation of performant Cassandra applications.