What is data denormalization in Cassandra?

Answer

Denormalization is not an anti-pattern in Cassandra — it is the correct approach. Because Cassandra doesn't support JOINs and queries require the partition key, you must store data in the shape required for each query. The golden rule: one query = one table. Example: in a blog application, you need two queries: (1) get posts by user, (2) get posts by tag. Relational approach: one posts table + JOINs. Cassandra approach: two tables: posts_by_user (user_id, post_id, title, content) and posts_by_tag (tag, post_id, title). Every new post triggers writes to both tables. The data is duplicated — accept this. Query-first design process: (1) identify all queries the app needs, (2) design a table for each query, (3) ensure all queries use the partition key. This is opposite to relational design (start with entities, then optimize queries). Denormalization means more storage and more complex writes, but enables fast, scalable reads without expensive JOINs.