What is Huffman encoding?

Q: What is Huffman encoding?

Huffman encoding is a lossless data compression algorithm that assigns shorter binary codes to more frequent characters and longer codes to less frequent ones, achieving optimal prefix-free encoding. Algorithm: (1) Build a frequency table for each character; (2) Create leaf nodes for each character and insert into a min-priority queue ordered by frequency; (3) While queue has more than 1 node: extract two minimum-frequency nodes, create a new internal node with these as children and frequency =

Answer

Huffman encoding is a lossless data compression algorithm that assigns shorter binary codes to more frequent characters and longer codes to less frequent ones, achieving optimal prefix-free encoding. Algorithm: (1) Build a frequency table for each character; (2) Create leaf nodes for each character and insert into a min-priority queue ordered by frequency; (3) While queue has more than 1 node: extract two minimum-frequency nodes, create a new internal node with these as children and frequency = sum of their frequencies, insert the new node back; (4) The last remaining node is the root of the Huffman tree; (5) Assign bits: left branch = 0, right branch = 1; each character's code is its path from root to leaf. Properties: prefix-free (no code is prefix of another — enables unambiguous decoding); optimal (no other prefix-free code achieves shorter average length). Average code length = sum(frequency[i] × depth[i]). Time: O(n log n). Applications: (1) ZIP, GZIP, DEFLATE compression; (2) JPEG, PNG (partial use); (3) MP3 audio; (4) Arithmetic coding (superior alternative); (5) Any lossless compression. Greedy proof: at each step, merging two lowest-frequency nodes is optimal (exchange argument). Huffman coding is the canonical example of a greedy algorithm with an elegant proof of optimality.

Answer

More Data Structures & Algorithms Questions