What is Huffman encoding?
Why Interviewers Ask This
Interviewers ask this to evaluate whether you have the depth of knowledge needed to mentor others and lead technical decisions. The expected answer goes beyond definitions into practical implications and real-world consequences.
Answer
Huffman encoding is a lossless data compression algorithm that assigns shorter binary codes to more frequent characters and longer codes to less frequent ones, achieving optimal prefix-free encoding. Algorithm: (1) Build a frequency table for each character; (2) Create leaf nodes for each character and insert into a min-priority queue ordered by frequency; (3) While queue has more than 1 node: extract two minimum-frequency nodes, create a new internal node with these as children and frequency = sum of their frequencies, insert the new node back; (4) The last remaining node is the root of the Huffman tree; (5) Assign bits: left branch = 0, right branch = 1; each character's code is its path from root to leaf. Properties: prefix-free (no code is prefix of another — enables unambiguous decoding); optimal (no other prefix-free code achieves shorter average length). Average code length = sum(frequency[i] × depth[i]). Time: O(n log n). Applications: (1) ZIP, GZIP, DEFLATE compression; (2) JPEG, PNG (partial use); (3) MP3 audio; (4) Arithmetic coding (superior alternative); (5) Any lossless compression. Greedy proof: at each step, merging two lowest-frequency nodes is optimal (exchange argument). Huffman coding is the canonical example of a greedy algorithm with an elegant proof of optimality.
Common Mistake
Candidates often give textbook answers here. Interviewers are more impressed when you relate the concept to a specific problem you solved in a real Data Structures & Algorithms project.
Previous
What is the difference between greedy algorithms and dynamic programming?
Next
What is the difference between DFS recursive and iterative implementations and their implications?