🔍 String Matching: Finding Needles in Haystacks
The Adventure Begins
Imagine you’re a detective 🕵️ with a special magnifying glass. Your job? Find specific words hidden inside a giant book. But here’s the catch—you want to find them super fast, not by checking every single letter one by one like a snail!
Today, we’ll learn three magical detective tricks that computers use to find patterns in text:
- KMP Algorithm – The Smart Bookmark Detective
- Rabin-Karp Algorithm – The Fingerprint Specialist
- Longest Common Prefix – The Pattern Spotter
🎯 What is String Matching?
Simple idea: You have a big text (like a book) and a small pattern (like a word). You want to find WHERE that word appears in the book.
Real Life Examples:
- 🔍 Finding “cat” in “the cat sat on a mat”
- 📱 Ctrl+F searching in documents
- 🧬 Finding DNA sequences in genomes
- 🔐 Detecting viruses in files
graph TD A[📚 Text: the cat sat] --> B{🔍 Find pattern} B --> C[Pattern: cat] C --> D[✅ Found at position 4!]
🧠 The Naive Way (The Slow Detective)
Before learning the smart tricks, let’s see the slow way:
// The slow detective checks every spot
function naiveSearch(text, pattern) {
let matches = [];
for (let i = 0; i <= text.length - pattern.length; i++) {
let found = true;
for (let j = 0; j < pattern.length; j++) {
if (text[i + j] !== pattern[j]) {
found = false;
break;
}
}
if (found) matches.push(i);
}
return matches;
}
Problem? If text has 1 million letters and pattern has 1000 letters, this could take 1 billion comparisons! 😱
📌 KMP Algorithm: The Smart Bookmark Detective
The Story
Imagine you’re reading a book, looking for the word “ABCABD”. You’re at page 5 and found “ABCAB” but the next letter is “C” instead of “D”.
The naive detective would go back to page 2 and start over. 😓
The KMP detective says: “Wait! I already know ‘AB’ matches at the start. Let me just continue from there!” 🧠
That’s KMP’s superpower—it NEVER goes backward!
The Magic Table (Failure Function)
KMP creates a cheat sheet before searching. This table tells us: “If there’s a mismatch, where should I jump to?”
// Build the magic cheat sheet
function buildLPS(pattern) {
let lps = new Array(pattern.length).fill(0);
let len = 0;
let i = 1;
while (i < pattern.length) {
if (pattern[i] === pattern[len]) {
len++;
lps[i] = len;
i++;
} else {
if (len !== 0) {
len = lps[len - 1];
} else {
lps[i] = 0;
i++;
}
}
}
return lps;
}
Example: Pattern = “ABCABD”
| Index | 0 | 1 | 2 | 3 | 4 | 5 |
|---|---|---|---|---|---|---|
| Char | A | B | C | A | B | D |
| LPS | 0 | 0 | 0 | 1 | 2 | 0 |
What does this mean?
- At position 4 (B), LPS=2 means “AB” at start matches “AB” here
- If mismatch at position 5, jump back to position 2!
The KMP Search
function KMPSearch(text, pattern) {
let lps = buildLPS(pattern);
let matches = [];
let i = 0; // text pointer
let j = 0; // pattern pointer
while (i < text.length) {
if (text[i] === pattern[j]) {
i++;
j++;
}
if (j === pattern.length) {
matches.push(i - j);
j = lps[j - 1];
} else if (i < text.length &&
text[i] !== pattern[j]) {
if (j !== 0) {
j = lps[j - 1]; // Use cheat sheet!
} else {
i++;
}
}
}
return matches;
}
graph TD A[Start] --> B["Compare text[i] with pattern[j]"] B -->|Match| C[Move both pointers forward] B -->|Mismatch| D{j = 0?} D -->|Yes| E[Move text pointer only] D -->|No| F[Jump using LPS table] C --> G{Pattern complete?} G -->|Yes| H[🎉 Found match!] G -->|No| B F --> B E --> B H --> B
Why KMP is Amazing
| Aspect | Naive | KMP |
|---|---|---|
| Time | O(n × m) | O(n + m) |
| Goes backward? | Yes 😓 | Never! 🚀 |
| Extra space | None | O(m) for LPS |
🔢 Rabin-Karp: The Fingerprint Detective
The Story
Imagine every word has a secret fingerprint number. Instead of comparing letters one by one, you just compare fingerprints!
- Fingerprint of “CAT” = 123
- Fingerprint of “DOG” = 456
- If fingerprints don’t match → words definitely don’t match!
The magic? We can calculate new fingerprints by “sliding” the window, reusing old calculations!
Rolling Hash: The Sliding Fingerprint
function rabinKarp(text, pattern) {
const BASE = 256;
const MOD = 101; // Prime number
let m = pattern.length;
let n = text.length;
let patternHash = 0;
let textHash = 0;
let h = 1;
let matches = [];
// Calculate h = BASE^(m-1) % MOD
for (let i = 0; i < m - 1; i++) {
h = (h * BASE) % MOD;
}
// Calculate initial hashes
for (let i = 0; i < m; i++) {
patternHash = (BASE * patternHash +
pattern.charCodeAt(i)) % MOD;
textHash = (BASE * textHash +
text.charCodeAt(i)) % MOD;
}
// Slide the window
for (let i = 0; i <= n - m; i++) {
if (patternHash === textHash) {
// Fingerprints match! Double-check
if (text.slice(i, i + m) === pattern) {
matches.push(i);
}
}
// Calculate next hash (rolling!)
if (i < n - m) {
textHash = (BASE * (textHash -
text.charCodeAt(i) * h) +
text.charCodeAt(i + m)) % MOD;
if (textHash < 0) textHash += MOD;
}
}
return matches;
}
Visual: The Rolling Hash
Text: "ABCDA" Pattern: "BCD"
Step 1: Hash("ABC") vs Hash("BCD") → Different ❌
Step 2: Hash("BCD") vs Hash("BCD") → Same! ✅ → Verify → Match!
Step 3: Hash("CDA") vs Hash("BCD") → Different ❌
Rolling: Hash("BCD") = Hash("ABC") - 'A' + 'D' (simplified)
graph TD A[Calculate Pattern Hash] --> B[Calculate First Window Hash] B --> C{Hashes Match?} C -->|No| D[Roll window forward] C -->|Yes| E[Verify characters] E -->|Match| F[🎉 Found!] E -->|False alarm| D D --> G{More windows?} G -->|Yes| C G -->|No| H[Done] F --> D
Why Rolling Hash is Clever
Old way: Recalculate entire hash = O(m) each time Rolling way: Update hash in O(1)!
Formula:
newHash = (oldHash - leftChar × h) × BASE + rightChar
It’s like removing the leftmost digit and adding a new rightmost digit!
🔗 Longest Common Prefix (LCP)
The Story
You have a bunch of words. What’s the longest starting part they ALL share?
Like finding what all your cousins have in common!
Example:
- “flower”
- “flow”
- “flight”
→ LCP = “fl” (they all start with “fl”!)
Simple Solution
function longestCommonPrefix(strs) {
if (strs.length === 0) return "";
let prefix = strs[0];
for (let i = 1; i < strs.length; i++) {
while (strs[i].indexOf(prefix) !== 0) {
prefix = prefix.slice(0, -1);
if (prefix === "") return "";
}
}
return prefix;
}
// Example
let words = ["flower", "flow", "flight"];
console.log(longestCommonPrefix(words));
// Output: "fl"
Vertical Scanning Approach
Compare characters column by column:
function lcpVertical(strs) {
if (strs.length === 0) return "";
for (let i = 0; i < strs[0].length; i++) {
let char = strs[0][i];
for (let j = 1; j < strs.length; j++) {
if (i >= strs[j].length ||
strs[j][i] !== char) {
return strs[0].slice(0, i);
}
}
}
return strs[0];
}
graph TD A[Start with first string] --> B[Check column 0] B --> C{All strings have same char?} C -->|Yes| D[Move to next column] C -->|No| E[Return prefix so far] D --> F{More columns?} F -->|Yes| B F -->|No| G[Return entire first string]
Divide and Conquer LCP
For huge arrays, split the problem in half!
function lcpDivide(strs, l, r) {
if (l === r) return strs[l];
let mid = Math.floor((l + r) / 2);
let lcpLeft = lcpDivide(strs, l, mid);
let lcpRight = lcpDivide(strs, mid + 1, r);
return commonPrefix(lcpLeft, lcpRight);
}
function commonPrefix(str1, str2) {
let minLen = Math.min(str1.length, str2.length);
for (let i = 0; i < minLen; i++) {
if (str1[i] !== str2[i]) {
return str1.slice(0, i);
}
}
return str1.slice(0, minLen);
}
// Usage
function findLCP(strs) {
if (strs.length === 0) return "";
return lcpDivide(strs, 0, strs.length - 1);
}
🎯 When to Use What?
| Algorithm | Best For | Time Complexity |
|---|---|---|
| KMP | Single pattern, multiple searches | O(n + m) |
| Rabin-Karp | Multiple patterns, plagiarism detection | O(n + m) average |
| LCP | Finding common prefixes in arrays | O(S) where S = sum of all chars |
🌟 Key Takeaways
-
KMP = The “never look back” detective 📚
- Uses LPS table to skip unnecessary comparisons
- Perfect for repeated searches
-
Rabin-Karp = The fingerprint specialist 🔢
- Compares hash values instead of characters
- Great for multiple pattern matching
-
LCP = The common ground finder 🤝
- Finds shared beginnings in string arrays
- Useful for autocomplete, data compression
🚀 You Did It!
You now understand three powerful string matching techniques that power:
- Search engines 🔍
- Spell checkers ✏️
- DNA analysis 🧬
- Plagiarism detectors 📋
Remember: The best detective isn’t the one who works hardest—it’s the one who works smartest! 🧠✨