Advanced Subqueries

Back

Loading concept...

🔍 Advanced Subqueries: The Detective’s Secret Toolkit

Imagine you’re a detective investigating a mystery. You have files, clues, and witnesses—but sometimes you need to ask one question INSIDE another question to crack the case. That’s exactly what advanced subqueries do!


🎭 The Big Picture: What Are Advanced Subqueries?

Think of a regular subquery like asking a friend a question before answering someone else:

“What’s the biggest pizza size?” → Then use that answer to find orders.

But advanced subqueries are smarter! They work like a detective who checks each suspect one-by-one against evidence. Each time they look at a new suspect, they ask a NEW question about THAT specific person.


🔄 Correlated Subqueries: The Personal Investigator

What Is It?

A correlated subquery is a question that changes based on which row you’re looking at. It’s like a detective who asks:

“For THIS specific suspect, do they have an alibi?”

Then moves to the next suspect and asks the SAME type of question—but about THAT person.

Simple Example: Finding Above-Average Performers

SELECT employee_name, salary, department
FROM employees e
WHERE salary > (
    SELECT AVG(salary)
    FROM employees
    WHERE department = e.department
);

What’s happening?

  • For EACH employee, check their department
  • Calculate the average salary for THAT department
  • Keep the employee only if they earn MORE than their department’s average

🎯 Key Insight

The inner query uses e.department from the OUTER query. That’s the “correlation”—the inner question depends on which row we’re examining!

graph TD A["Pick Employee Row"] --> B["Look at Their Department"] B --> C["Calculate Dept Average"] C --> D{Salary > Average?} D -->|Yes| E["Include in Results"] D -->|No| F["Skip This Row"] E --> G["Move to Next Row"] F --> G G --> A

✅ EXISTS Operator: The “Is There ANY Evidence?” Check

What Is It?

EXISTS is like asking: “Is there AT LEAST ONE matching clue?”

It doesn’t care HOW MANY matches exist—just whether ANY exist at all.

Simple Example: Customers Who Placed Orders

SELECT customer_name
FROM customers c
WHERE EXISTS (
    SELECT 1
    FROM orders o
    WHERE o.customer_id = c.customer_id
);

Translation:

“Show me customers where at least ONE order exists for them.”

Why SELECT 1?

The SELECT 1 (or SELECT *) inside EXISTS doesn’t matter. EXISTS only checks: “Did this query return ANY rows?”

  • Returns rows? → TRUE âś…
  • Returns nothing? → FALSE ❌
graph TD A["Check Customer: Alice"] --> B["Any orders for Alice?"] B -->|Found 3 orders| C["EXISTS = TRUE"] C --> D["Include Alice"] E["Check Customer: Bob"] --> F["Any orders for Bob?"] F -->|Found 0 orders| G["EXISTS = FALSE"] G --> H["Skip Bob"]

❌ NOT EXISTS: The “No Evidence Found” Check

What Is It?

NOT EXISTS is the opposite—it finds rows where NO matching records exist.

Simple Example: Customers Who Never Ordered

SELECT customer_name
FROM customers c
WHERE NOT EXISTS (
    SELECT 1
    FROM orders o
    WHERE o.customer_id = c.customer_id
);

Translation:

“Show me customers who have ZERO orders.”

Real-World Use: Finding Unused Products

SELECT product_name
FROM products p
WHERE NOT EXISTS (
    SELECT 1
    FROM order_items oi
    WHERE oi.product_id = p.product_id
);

This finds products that nobody has ever bought! đź›’


🏆 Finding the Nth Highest Value

The Challenge

“Who has the 3rd highest salary?” This is trickier than finding the #1!

Method 1: Using Correlated Subquery

SELECT employee_name, salary
FROM employees e1
WHERE 2 = (
    SELECT COUNT(DISTINCT salary)
    FROM employees e2
    WHERE e2.salary > e1.salary
);

The Logic:

  • For the 3rd highest: exactly 2 salaries should be HIGHER
  • For the Nth highest: exactly (N-1) salaries should be HIGHER

Method 2: Using LIMIT and OFFSET

SELECT employee_name, salary
FROM employees
ORDER BY salary DESC
LIMIT 1 OFFSET 2;

Simpler!

  • ORDER BY salary DESC → Sort highest first
  • OFFSET 2 → Skip the top 2
  • LIMIT 1 → Take just the next one (3rd highest)
graph TD A["All Salaries: 100k, 90k, 85k, 80k, 75k"] --> B["Want 3rd Highest"] B --> C["Count: How many are HIGHER?"] C --> D["100k: 0 higher = 1st"] C --> E["90k: 1 higher = 2nd"] C --> F["85k: 2 higher = 3rd âś“"] C --> G["80k: 3 higher = 4th"]

👯 Finding Duplicate Records

The Problem

Duplicates sneak into databases! Maybe someone entered the same email twice, or the same order got recorded multiple times.

Method 1: GROUP BY with HAVING

SELECT email, COUNT(*) as count
FROM users
GROUP BY email
HAVING COUNT(*) > 1;

Translation:

“Group by email, then show only groups with MORE than 1 member.”

Method 2: Self-Join to Find All Duplicate Rows

SELECT u1.*
FROM users u1
WHERE EXISTS (
    SELECT 1
    FROM users u2
    WHERE u1.email = u2.email
    AND u1.user_id <> u2.user_id
);

Translation:

“Find users where ANOTHER user has the same email.”

Method 3: Using Window Functions

SELECT *
FROM (
    SELECT *,
           COUNT(*) OVER(PARTITION BY email) as dup_count
    FROM users
) subquery
WHERE dup_count > 1;

This shows ALL columns of duplicate rows! đź“‹


🕳️ Finding Gaps in Sequences

The Problem

You have IDs: 1, 2, 3, 5, 6, 8, 10…

Where are the missing numbers? (4, 7, 9)

Method 1: Find Where Next ID Isn’t +1

SELECT id + 1 AS gap_start,
       (SELECT MIN(id) FROM orders o2
        WHERE o2.id > o1.id) - 1 AS gap_end
FROM orders o1
WHERE NOT EXISTS (
    SELECT 1
    FROM orders o2
    WHERE o2.id = o1.id + 1
)
AND id < (SELECT MAX(id) FROM orders);

What’s happening?

  1. For each ID, check if (ID + 1) exists
  2. If NOT, we found a gap!
  3. Find where the gap ends (next existing ID minus 1)

Method 2: Using LEAD (Simpler!)

SELECT id + 1 AS gap_start,
       next_id - 1 AS gap_end
FROM (
    SELECT id,
           LEAD(id) OVER(ORDER BY id) AS next_id
    FROM orders
) subquery
WHERE next_id - id > 1;

LEAD peeks at the next row’s value—super handy!

graph TD A["IDs: 1, 2, 3, 5, 6, 8"] --> B["Compare Each to Next"] B --> C["1→2: Gap of 1 ✓"] B --> D["2→3: Gap of 1 ✓"] B --> E["3→5: Gap of 2 ⚠️"] B --> F["5→6: Gap of 1 ✓"] B --> G["6→8: Gap of 2 ⚠️"] E --> H["Missing: 4"] G --> I["Missing: 7"]

🎯 Quick Reference Table

Technique Use When Key Operator
Correlated Subquery Compare each row to a calculation Uses outer table in inner query
EXISTS Check if ANY related rows exist WHERE EXISTS (...)
NOT EXISTS Check if NO related rows exist WHERE NOT EXISTS (...)
Nth Highest Find specific ranking COUNT or OFFSET
Find Duplicates Detect repeated values GROUP BY + HAVING
Find Gaps Detect missing sequences LEAD or NOT EXISTS

đź§  Memory Tricks

  1. Correlated = Connected → The inner query is CONNECTED to the outer row
  2. EXISTS = Evidence Check → “Does ANY evidence exist?”
  3. NOT EXISTS = Absence Proof → “Can we prove NOTHING matches?”
  4. Nth Highest = Count Above → How many scores beat this one?
  5. Duplicates = Same + Different → Same value, different ID
  6. Gaps = Missing Neighbors → Current + 1 should exist, but doesn’t

🚀 You’ve Got This!

These advanced subqueries are your detective tools:

  • Correlated subqueries let you ask personalized questions for each row
  • EXISTS/NOT EXISTS quickly check for presence or absence
  • Nth highest finds specific rankings without listing everything
  • Duplicates catch sneaky repeated data
  • Gaps reveal missing pieces in your sequences

Now go investigate your data! 🔍🎉

Loading story...

Story - Premium Content

Please sign in to view this story and start learning.

Upgrade to Premium to unlock full access to all stories.

Stay Tuned!

Story is coming soon.

Story Preview

Story - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.