What is the Rabin-Karp string matching algorithm?

Answer

Rabin-Karp is a string searching algorithm that uses rolling hashing to find pattern(s) in text efficiently. Naive approach O(n×m) re-compares every character at each position. Rabin-Karp uses a hash of the pattern and a rolling hash of the text window — skip positions where hashes differ (likely mismatches), only verify character-by-character when hashes match. Rolling hash: when the window slides one position, update the hash in O(1): remove leftmost character's contribution, add new rightmost character. Hash formula: hash(s) = (s[0]×p^(m-1) + s[1]×p^(m-2) + ... + s[m-1]) mod q. Rolling: new_hash = (hash - s[left]×p^(m-1)) × p + s[right], mod q. Average time: O(n+m). Worst case (many hash collisions): O(n×m) — choose large prime q and random base p to minimize. Applications: (1) Plagiarism detection — find matching substrings; (2) Multiple pattern search: hash all patterns; O(n×k) best vs O(n×m×k) naive; (3) Longest repeated substring; (4) Longest common substring; (5) 2D pattern matching. KMP (O(n+m)) is better for single pattern exact matching; Rabin-Karp shines for multiple patterns and approximate matching. Aho-Corasick (O(n+m+k)) is optimal for many patterns simultaneously.

Answer

More Data Structures & Algorithms Questions