String Matching

Loading concept...

🔍 String Matching: Finding Needles in Haystacks

The Adventure Begins

Imagine you’re a detective 🕵️ with a special magnifying glass. Your job? Find specific words hidden inside a giant book. But here’s the catch—you want to find them super fast, not by checking every single letter one by one like a snail!

Today, we’ll learn three magical detective tricks that computers use to find patterns in text:

  1. KMP Algorithm – The Smart Bookmark Detective
  2. Rabin-Karp Algorithm – The Fingerprint Specialist
  3. Longest Common Prefix – The Pattern Spotter

🎯 What is String Matching?

Simple idea: You have a big text (like a book) and a small pattern (like a word). You want to find WHERE that word appears in the book.

Real Life Examples:

  • 🔍 Finding “cat” in “the cat sat on a mat”
  • 📱 Ctrl+F searching in documents
  • 🧬 Finding DNA sequences in genomes
  • 🔐 Detecting viruses in files
graph TD A[📚 Text: the cat sat] --> B{🔍 Find pattern} B --> C[Pattern: cat] C --> D[✅ Found at position 4!]

🧠 The Naive Way (The Slow Detective)

Before learning the smart tricks, let’s see the slow way:

// The slow detective checks every spot
function naiveSearch(text, pattern) {
  let matches = [];
  for (let i = 0; i <= text.length - pattern.length; i++) {
    let found = true;
    for (let j = 0; j < pattern.length; j++) {
      if (text[i + j] !== pattern[j]) {
        found = false;
        break;
      }
    }
    if (found) matches.push(i);
  }
  return matches;
}

Problem? If text has 1 million letters and pattern has 1000 letters, this could take 1 billion comparisons! 😱


📌 KMP Algorithm: The Smart Bookmark Detective

The Story

Imagine you’re reading a book, looking for the word “ABCABD”. You’re at page 5 and found “ABCAB” but the next letter is “C” instead of “D”.

The naive detective would go back to page 2 and start over. 😓

The KMP detective says: “Wait! I already know ‘AB’ matches at the start. Let me just continue from there!” 🧠

That’s KMP’s superpower—it NEVER goes backward!

The Magic Table (Failure Function)

KMP creates a cheat sheet before searching. This table tells us: “If there’s a mismatch, where should I jump to?”

// Build the magic cheat sheet
function buildLPS(pattern) {
  let lps = new Array(pattern.length).fill(0);
  let len = 0;
  let i = 1;

  while (i < pattern.length) {
    if (pattern[i] === pattern[len]) {
      len++;
      lps[i] = len;
      i++;
    } else {
      if (len !== 0) {
        len = lps[len - 1];
      } else {
        lps[i] = 0;
        i++;
      }
    }
  }
  return lps;
}

Example: Pattern = “ABCABD”

Index 0 1 2 3 4 5
Char A B C A B D
LPS 0 0 0 1 2 0

What does this mean?

  • At position 4 (B), LPS=2 means “AB” at start matches “AB” here
  • If mismatch at position 5, jump back to position 2!

The KMP Search

function KMPSearch(text, pattern) {
  let lps = buildLPS(pattern);
  let matches = [];
  let i = 0; // text pointer
  let j = 0; // pattern pointer

  while (i < text.length) {
    if (text[i] === pattern[j]) {
      i++;
      j++;
    }

    if (j === pattern.length) {
      matches.push(i - j);
      j = lps[j - 1];
    } else if (i < text.length &&
               text[i] !== pattern[j]) {
      if (j !== 0) {
        j = lps[j - 1]; // Use cheat sheet!
      } else {
        i++;
      }
    }
  }
  return matches;
}
graph TD A[Start] --> B["Compare text[i] with pattern[j]"] B -->|Match| C[Move both pointers forward] B -->|Mismatch| D{j = 0?} D -->|Yes| E[Move text pointer only] D -->|No| F[Jump using LPS table] C --> G{Pattern complete?} G -->|Yes| H[🎉 Found match!] G -->|No| B F --> B E --> B H --> B

Why KMP is Amazing

Aspect Naive KMP
Time O(n × m) O(n + m)
Goes backward? Yes 😓 Never! 🚀
Extra space None O(m) for LPS

🔢 Rabin-Karp: The Fingerprint Detective

The Story

Imagine every word has a secret fingerprint number. Instead of comparing letters one by one, you just compare fingerprints!

  • Fingerprint of “CAT” = 123
  • Fingerprint of “DOG” = 456
  • If fingerprints don’t match → words definitely don’t match!

The magic? We can calculate new fingerprints by “sliding” the window, reusing old calculations!

Rolling Hash: The Sliding Fingerprint

function rabinKarp(text, pattern) {
  const BASE = 256;
  const MOD = 101; // Prime number
  let m = pattern.length;
  let n = text.length;
  let patternHash = 0;
  let textHash = 0;
  let h = 1;
  let matches = [];

  // Calculate h = BASE^(m-1) % MOD
  for (let i = 0; i < m - 1; i++) {
    h = (h * BASE) % MOD;
  }

  // Calculate initial hashes
  for (let i = 0; i < m; i++) {
    patternHash = (BASE * patternHash +
                   pattern.charCodeAt(i)) % MOD;
    textHash = (BASE * textHash +
                text.charCodeAt(i)) % MOD;
  }

  // Slide the window
  for (let i = 0; i <= n - m; i++) {
    if (patternHash === textHash) {
      // Fingerprints match! Double-check
      if (text.slice(i, i + m) === pattern) {
        matches.push(i);
      }
    }

    // Calculate next hash (rolling!)
    if (i < n - m) {
      textHash = (BASE * (textHash -
                  text.charCodeAt(i) * h) +
                  text.charCodeAt(i + m)) % MOD;
      if (textHash < 0) textHash += MOD;
    }
  }
  return matches;
}

Visual: The Rolling Hash

Text: "ABCDA"  Pattern: "BCD"

Step 1: Hash("ABC") vs Hash("BCD") → Different ❌
Step 2: Hash("BCD") vs Hash("BCD") → Same! ✅ → Verify → Match!
Step 3: Hash("CDA") vs Hash("BCD") → Different ❌

Rolling: Hash("BCD") = Hash("ABC") - 'A' + 'D' (simplified)
graph TD A[Calculate Pattern Hash] --> B[Calculate First Window Hash] B --> C{Hashes Match?} C -->|No| D[Roll window forward] C -->|Yes| E[Verify characters] E -->|Match| F[🎉 Found!] E -->|False alarm| D D --> G{More windows?} G -->|Yes| C G -->|No| H[Done] F --> D

Why Rolling Hash is Clever

Old way: Recalculate entire hash = O(m) each time Rolling way: Update hash in O(1)!

Formula:

newHash = (oldHash - leftChar × h) × BASE + rightChar

It’s like removing the leftmost digit and adding a new rightmost digit!


🔗 Longest Common Prefix (LCP)

The Story

You have a bunch of words. What’s the longest starting part they ALL share?

Like finding what all your cousins have in common!

Example:

  • “flower”
  • “flow”
  • “flight”

→ LCP = “fl” (they all start with “fl”!)

Simple Solution

function longestCommonPrefix(strs) {
  if (strs.length === 0) return "";

  let prefix = strs[0];

  for (let i = 1; i < strs.length; i++) {
    while (strs[i].indexOf(prefix) !== 0) {
      prefix = prefix.slice(0, -1);
      if (prefix === "") return "";
    }
  }

  return prefix;
}

// Example
let words = ["flower", "flow", "flight"];
console.log(longestCommonPrefix(words));
// Output: "fl"

Vertical Scanning Approach

Compare characters column by column:

function lcpVertical(strs) {
  if (strs.length === 0) return "";

  for (let i = 0; i < strs[0].length; i++) {
    let char = strs[0][i];
    for (let j = 1; j < strs.length; j++) {
      if (i >= strs[j].length ||
          strs[j][i] !== char) {
        return strs[0].slice(0, i);
      }
    }
  }

  return strs[0];
}
graph TD A[Start with first string] --> B[Check column 0] B --> C{All strings have same char?} C -->|Yes| D[Move to next column] C -->|No| E[Return prefix so far] D --> F{More columns?} F -->|Yes| B F -->|No| G[Return entire first string]

Divide and Conquer LCP

For huge arrays, split the problem in half!

function lcpDivide(strs, l, r) {
  if (l === r) return strs[l];

  let mid = Math.floor((l + r) / 2);
  let lcpLeft = lcpDivide(strs, l, mid);
  let lcpRight = lcpDivide(strs, mid + 1, r);

  return commonPrefix(lcpLeft, lcpRight);
}

function commonPrefix(str1, str2) {
  let minLen = Math.min(str1.length, str2.length);
  for (let i = 0; i < minLen; i++) {
    if (str1[i] !== str2[i]) {
      return str1.slice(0, i);
    }
  }
  return str1.slice(0, minLen);
}

// Usage
function findLCP(strs) {
  if (strs.length === 0) return "";
  return lcpDivide(strs, 0, strs.length - 1);
}

🎯 When to Use What?

Algorithm Best For Time Complexity
KMP Single pattern, multiple searches O(n + m)
Rabin-Karp Multiple patterns, plagiarism detection O(n + m) average
LCP Finding common prefixes in arrays O(S) where S = sum of all chars

🌟 Key Takeaways

  1. KMP = The “never look back” detective 📚

    • Uses LPS table to skip unnecessary comparisons
    • Perfect for repeated searches
  2. Rabin-Karp = The fingerprint specialist 🔢

    • Compares hash values instead of characters
    • Great for multiple pattern matching
  3. LCP = The common ground finder 🤝

    • Finds shared beginnings in string arrays
    • Useful for autocomplete, data compression

🚀 You Did It!

You now understand three powerful string matching techniques that power:

  • Search engines 🔍
  • Spell checkers ✏️
  • DNA analysis 🧬
  • Plagiarism detectors 📋

Remember: The best detective isn’t the one who works hardest—it’s the one who works smartest! 🧠✨

Loading story...

No Story Available

This concept doesn't have a story yet.

Story Preview

Story - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.

Interactive Preview

Interactive - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.

No Interactive Content

This concept doesn't have interactive content yet.

Cheatsheet Preview

Cheatsheet - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.

No Cheatsheet Available

This concept doesn't have a cheatsheet yet.

Quiz Preview

Quiz - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.

No Quiz Available

This concept doesn't have a quiz yet.