String Matching

Loading concept...

🔍 String Matching: Finding Needles in a Haystack

Imagine you lost your favorite toy in a huge pile of similar-looking toys. How would you find it? Would you check every single toy one by one? That works, but it’s slow! What if you had a magic trick to skip the toys you already know don’t match?

That’s exactly what string matching algorithms do! They help computers find a small word (called a pattern) inside a big text (called the haystack) — and they do it super fast.


🎯 What You’ll Learn

  1. KMP Algorithm — The “Memory Helper” that never forgets
  2. Rabin-Karp Algorithm — The “Fingerprint Detective”
  3. Longest Common Prefix (LCP) — Finding what strings have in common

📚 The Story of the Lost Word

Let’s say you have a long story:

TEXT: "ABABDABACDABABCABAB"

And you’re looking for:

PATTERN: "ABABCABAB"

The slow way: Compare every letter, one by one. If it doesn’t match, start over from the next position. This is like checking every toy in the pile, even if you just saw it!

The smart way: Remember what you already checked. If something doesn’t match, use your memory to skip ahead!


🧠 1. KMP Algorithm (Knuth-Morris-Pratt)

The “Memory Helper” Trick

Imagine this: You’re reading a book looking for the word “ABABC”. You’re at “ABAB” and the next letter is “D” instead of “C”.

Dumb approach: Go back to the beginning and start over.

KMP approach: Wait! “AB” at the end of “ABAB” looks just like “AB” at the beginning! I can continue from there!

How KMP Works

KMP builds a “failure function” (also called LPS — Longest Prefix Suffix) that tells us: “If we fail here, where can we safely continue?”

graph TD A[Start Matching] --> B{Characters Match?} B -->|Yes| C[Move Both Pointers Forward] C --> B B -->|No| D{Pattern at Start?} D -->|Yes| E[Move Text Pointer Only] D -->|No| F[Use LPS to Skip Ahead] F --> B E --> B

Building the LPS Array

For pattern “ABABCABAB”:

Index 0 1 2 3 4 5 6 7 8
Char A B A B C A B A B
LPS 0 0 1 2 0 1 2 3 4

What does this mean?

  • LPS[3] = 2 means: At position 3, the longest prefix that’s also a suffix is length 2 (“AB”)
  • If we fail after position 3, we can skip back to position 2 instead of starting over!

Java Code Example

public int[] buildLPS(String pattern) {
    int[] lps = new int[pattern.length()];
    int len = 0;
    int i = 1;

    while (i < pattern.length()) {
        if (pattern.charAt(i) ==
            pattern.charAt(len)) {
            len++;
            lps[i] = len;
            i++;
        } else if (len > 0) {
            len = lps[len - 1];
        } else {
            lps[i] = 0;
            i++;
        }
    }
    return lps;
}

Simple Example

Finding “ABA” in “ABABA”:

Text:    A B A B A
Pattern: A B A
         ✓ ✓ ✓ → Found at index 0!

Move pattern using LPS...

Text:    A B A B A
Pattern:     A B A
             ✓ ✓ ✓ → Found at index 2!

Why KMP is Fast: Time complexity is O(n + m) where n = text length, m = pattern length. No backtracking!


🔐 2. Rabin-Karp Algorithm

The “Fingerprint Detective” Trick

Imagine: Instead of comparing every letter, what if each word had a fingerprint (a number)? You could just compare fingerprints!

  • If fingerprints don’t match → Words definitely don’t match ❌
  • If fingerprints match → Check the actual letters (might be a false alarm!) ✓

The Magic: Rolling Hash

The secret sauce is the rolling hash. When you slide your window by one character:

Window 1: [A B C] D E → Hash = 123
Window 2: A [B C D] E → Hash = 234 (calculated from 123!)

You don’t recalculate everything! You just:

  1. Remove the old first character’s contribution
  2. Add the new last character

Java Code Example

public void rabinKarp(String text,
                       String pattern) {
    int d = 256;  // alphabet size
    int q = 101;  // a prime number
    int m = pattern.length();
    int n = text.length();
    int h = 1;

    // h = d^(m-1) % q
    for (int i = 0; i < m - 1; i++)
        h = (h * d) % q;

    // Calculate initial hashes
    int p = 0, t = 0;
    for (int i = 0; i < m; i++) {
        p = (d * p + pattern.charAt(i)) % q;
        t = (d * t + text.charAt(i)) % q;
    }

    // Slide and compare
    for (int i = 0; i <= n - m; i++) {
        if (p == t) {
            // Verify actual characters
            if (text.substring(i, i+m)
                    .equals(pattern))
                System.out.println(
                    "Found at: " + i);
        }
        // Calculate next hash
        if (i < n - m) {
            t = (d * (t - text.charAt(i) * h)
                 + text.charAt(i + m)) % q;
            if (t < 0) t += q;
        }
    }
}

Simple Example

Finding “AB” in “AABAB”:

Let’s use a simple hash: A=1, B=2

Pattern "AB" → Hash = 1×10 + 2 = 12

Text positions:
[AA] → 1×10 + 1 = 11 ≠ 12 ❌
[AB] → 1×10 + 2 = 12 = 12 ✓ Check! Match!
[BA] → 2×10 + 1 = 21 ≠ 12 ❌
[AB] → 1×10 + 2 = 12 = 12 ✓ Check! Match!

Why Rabin-Karp is Cool:

  • Great for multiple pattern matching (search for many words at once!)
  • Average time: O(n + m)
  • Worst case (many false positives): O(nm)

🔗 3. Longest Common Prefix (LCP)

Finding What Strings Share

Question: What do these words have in common at the start?

"flower"
"flow"
"flight"

Answer: “fl” — That’s the Longest Common Prefix!

Two Approaches

Approach 1: Horizontal Scanning

Compare strings two at a time:

graph TD A["flower, flow, flight"] --> B["LCP#40;flower, flow#41; = flow"] B --> C["LCP#40;flow, flight#41; = fl"] C --> D["Result: fl"]

Approach 2: Vertical Scanning

Compare column by column:

f l o w e r
f l o w
f l i g h t
↓ ↓ ↓
f l ✗ → Stop! LCP = "fl"

Java Code: Vertical Scanning

public String longestCommonPrefix(
        String[] strs) {
    if (strs == null || strs.length == 0)
        return "";

    for (int i = 0;
         i < strs[0].length(); i++) {
        char c = strs[0].charAt(i);

        for (int j = 1;
             j < strs.length; j++) {
            if (i >= strs[j].length() ||
                strs[j].charAt(i) != c) {
                return strs[0]
                    .substring(0, i);
            }
        }
    }
    return strs[0];
}

Simple Example

String[] words = {"interstellar",
                  "internet",
                  "internal"};

// Column check:
// i: ✓ n: ✓ t: ✓ e: ✓ r: ✓ n: ?
// "interstellar" has 'n'
// "internet" has 'n'
// "internal" has 'n' ✓
// Continue...
// "interstellar" has 's'
// "internet" has 'e' ❌ STOP!

// Result: "intern"

🎯 When to Use What?

Algorithm Best For Time Space
KMP Single pattern, no preprocessing overhead O(n+m) O(m)
Rabin-Karp Multiple patterns, plagiarism detection O(n+m) avg O(1)
LCP Finding common beginnings in word lists O(S) O(1)

S = sum of all characters in all strings


🌟 Real-World Magic

These algorithms power things you use every day:

  • 🔍 Search engines — Finding your query in billions of pages
  • 📝 Text editors — Ctrl+F (Find and Replace)
  • 🧬 DNA matching — Finding gene sequences
  • 🔒 Plagiarism checkers — Finding copied text
  • 🌐 Autocomplete — Suggesting words as you type (uses LCP!)

💡 Key Takeaways

  1. KMP = Build a “memory map” (LPS array) so you never check the same thing twice
  2. Rabin-Karp = Use “fingerprints” (hashes) for quick comparisons
  3. LCP = Scan character-by-character to find shared beginnings

Remember the metaphors:

  • 🧠 KMP is your smart friend with perfect memory
  • 🔍 Rabin-Karp is a detective with fingerprint files
  • 🔗 LCP is finding the family resemblance between words

🚀 You Did It!

You now understand how computers find words at lightning speed! These algorithms might seem tricky at first, but remember:

  • KMP: “If I’ve seen this before, I can skip ahead!”
  • Rabin-Karp: “Let me check the fingerprint first!”
  • LCP: “What do we all have in common?”

Go forth and search strings like a pro! 🎉

Loading story...

No Story Available

This concept doesn't have a story yet.

Story Preview

Story - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.

Interactive Preview

Interactive - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.

No Interactive Content

This concept doesn't have interactive content yet.

Cheatsheet Preview

Cheatsheet - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.

No Cheatsheet Available

This concept doesn't have a cheatsheet yet.

Quiz Preview

Quiz - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.

No Quiz Available

This concept doesn't have a quiz yet.