What is the KMP (Knuth-Morris-Pratt) algorithm?

Answer

KMP is a linear-time string matching algorithm that avoids redundant comparisons by preprocessing the pattern to find proper prefix-suffix overlaps. The key insight: when a mismatch occurs, we've already matched some prefix of the pattern in the text — use that knowledge to avoid re-scanning. Failure function (LPS — Longest Proper Prefix Suffix): lps[i] = length of longest proper prefix of pattern[0..i] that is also a suffix. Precompute lps in O(m). Matching: scan text with pointer i, pattern with j; if match, i++ and j++; if j == m, pattern found; if mismatch and j > 0: j = lps[j-1] (don't reset i — skip known characters); if j == 0 and mismatch: i++. Time: O(n+m) — each character is visited at most twice. Space: O(m) for lps array. Example: pattern = "ABABC", text = "ABABABABC". The lps = [0,0,1,2,0]. When mismatch at position 4, jump to lps[3]=2 and continue. Z-algorithm: alternative O(n+m) string matching — Z[i] = length of longest common prefix of the string and the suffix starting at i. Both KMP and Z-algorithm are optimal for single-pattern exact matching. Applications: (1) Substring search; (2) Detecting periodic strings (pattern is its own periodic unit); (3) DNA sequence analysis; (4) Text editors (Find/Replace).

Answer

More Data Structures & Algorithms Questions