What is data partitioning?
Why Interviewers Ask This
Interviewers use this question to quickly assess whether a candidate has the foundational knowledge required for System Design development. It reveals whether you understand the building blocks that more complex concepts rely on.
Answer
Data partitioning (also called data partitioning or sharding) divides a large dataset into smaller, manageable pieces distributed across multiple storage nodes. This enables horizontal scaling of storage and query performance. Types: (1) Horizontal partitioning (sharding): different rows of the same table go to different partitions — user ID 1-1000 on shard 1, 1001-2000 on shard 2. Each shard has the same schema. Most common type; (2) Vertical partitioning: split a table by columns — user profile (ID, name, email) in one partition, user settings (ID, preferences, notifications) in another. Related data accessed together stays together; (3) Functional partitioning: data segregated by functional area — orders data in one cluster, product catalog in another. Similar to microservices data isolation. Partition strategies: range (value ranges), hash (hash function distributes evenly), list (specific values to specific partitions), composite (combination). Considerations: Hotspots: a partition receiving disproportionate load (hash-based helps avoid this); Cross-partition queries: joining data from multiple partitions is expensive — design data model to minimize this; Rebalancing: adding nodes requires moving data — consistent hashing minimizes this; Referential integrity: foreign key constraints across partitions are not enforceable by the database. Partitioning is often the last resort after exhausting vertical scaling, caching, and read replicas.
Pro Tip
Demonstrate both theoretical understanding and practical experience. Say what it is, then give an example of how you actually used it in a System Design codebase.