What is a B-tree and where is it used?
Why Interviewers Ask This
Interviewers ask this to evaluate whether you have the depth of knowledge needed to mentor others and lead technical decisions. The expected answer goes beyond definitions into practical implications and real-world consequences.
Answer
A B-tree is a self-balancing search tree generalization of BST where each node can have multiple keys and multiple children (between t and 2t children for minimum degree t). Properties: (1) All leaves are at the same depth (balanced); (2) Each node (except root) has at least ⌈m/2⌉-1 keys (at least half full); (3) Each node has at most m-1 keys (m children) — m is the order; (4) Keys within a node are sorted; (5) A node with k keys has k+1 children. Operations: O(log n) for search, insert, delete. Why B-trees for databases/filesystems: designed to minimize disk I/O — nodes correspond to disk pages (typically 4KB-16KB); wide branching factor means very few disk accesses for any lookup (a B-tree with billions of entries has height ~3-4). B+ tree (most databases, including MySQL InnoDB, PostgreSQL): all actual data (values) stored in leaf nodes; internal nodes store only keys for routing; leaf nodes linked as a doubly linked list — enables efficient range scans (walk from first matching key through linked leaf list). B- vs B+: B+ tree requires range scans in O(k+log n) vs O(k log n) for B-tree. Applications: (1) Database indexes (primary key, secondary indexes); (2) File systems (HFS+, NTFS, ext4, Btrfs); (3) Key-value stores; (4) In-memory databases.
Pro Tip
This topic has Data Structures & Algorithms-specific nuances that differ from general programming. Highlighting those nuances in your answer shows expertise rather than generic knowledge.
Previous
What is the KMP (Knuth-Morris-Pratt) algorithm?
Next
What is the difference between DFS and BFS in terms of space complexity?