When should I embed vs reference data in NoSQL?

Embed when data is small, always needed together, and rarely changes. Reference when data is large, shared by many documents, or changes often.

What is query-first design?

Query-first design means starting with your questions, then building your data structure around them. Each question should need just one database trip.

NoSQL Modeling and Embedding | Data Design Guide

Q: What is data modeling in NoSQL?

NoSQL data modeling organizes data based on how you'll use it, not what it looks like. Design around your queries, not your data structure.

🏠 Data Modeling in NoSQL: Building Your Dream House

Imagine you’re building a house. Not just any house—YOUR dream house! In a traditional (SQL) world, you’d have separate rooms for everything: one room for shoes, another for coats, another for umbrellas. Every time you go outside, you’d run to three different rooms!

But what if you could put your shoes, coat, AND umbrella right next to the front door? That’s NoSQL data modeling!

🎯 What is Data Modeling in NoSQL?

Simple idea: Organize your data based on HOW you’ll use it, not what it looks like.

Think of it like packing for a trip:

SQL way: Pack all shirts together, all pants together, all socks together
NoSQL way: Pack Monday’s outfit together, Tuesday’s outfit together…

When you wake up Monday morning, which suitcase is easier to use? 🧳

The Big Difference

Traditional (SQL)	NoSQL
Organize by type	Organize by use
Many small boxes	Fewer bigger boxes
Assemble later	Ready to go now!

🔍 Query-First Design: Start With Questions!

Here’s a secret that will change everything: Don’t start with your data. Start with your QUESTIONS.

The Lemonade Stand Story 🍋

Imagine you’re running a lemonade stand. You need to answer these questions fast:

“How much money did I make today?”
“Who bought the most lemonade?”
“What time was busiest?”

In Query-First Design, you build your data around these questions!

Wrong approach: “I have customers, sales, times… let me organize them separately.”

Right approach: “My main question is ‘today’s sales’—so I’ll store everything I need for that answer together!”

How to Do Query-First Design

Step 1: Write down your questions
Step 2: For each question, list what data you need
Step 3: Store that data together

Example:

Question: "Show user profile with their posts"

Data needed:
- User name ✓
- User photo ✓
- User bio ✓
- Recent posts ✓

→ Store ALL of this in ONE document!

🛤️ Access Patterns: Your Data’s Travel Routes

Access patterns are like the paths people walk in your house. If everyone walks from the kitchen to the living room, you’d put a door there, right?

What Are Access Patterns?

An access pattern is simply: “How will my app ask for data?”

Think of a library:

Pattern 1: “Find books by this author” 📚
Pattern 2: “Find books in this genre” 📖
Pattern 3: “Find books borrowed today” 📋

Each pattern is a different “path” through your data!

Real Example: Social Media App

Access Patterns for Instagram-like app:

1. Show user's profile → need: name, bio, photo
2. Show user's posts → need: post images, captions, dates
3. Show post comments → need: comment text, commenter name
4. Show who user follows → need: list of followed users

Golden Rule: Design your data so each pattern needs just ONE trip to the database!

graph TD
    A["📱 User opens app"] --> B{What do they want?}
    B --> C["View Profile"]
    B --> D["See Feed"]
    B --> E["Read Comments"]
    C --> F["One database call!"]
    D --> F
    E --> F

📋 Denormalization: Copying Data on Purpose!

This might sound crazy: In NoSQL, we COPY data on purpose.

The Pizza Menu Story 🍕

Imagine a pizza restaurant:

Normalized (SQL) way:

Table 1: Pizza names
Table 2: Toppings
Table 3: Prices
Table 4: Which toppings go on which pizza

To show ONE menu item, you need to check FOUR tables! 😰

Denormalized (NoSQL) way:

Pepperoni Pizza:
  - Name: "Pepperoni Pizza"
  - Toppings: ["cheese", "pepperoni", "tomato sauce"]
  - Price: $12.99
  - Description: "Classic favorite!"

Everything in ONE place! ✨

Why Copy Data?

Normalized	Denormalized
Change once, updates everywhere	Need to update multiple places
Slower to read	SUPER fast to read
Complex queries	Simple queries
Good for rarely-read data	Perfect for frequently-read data

When to Denormalize

✅ DO denormalize when:

You read data WAY more than you write it
Speed matters most
Data doesn’t change often

❌ DON’T denormalize when:

Data changes constantly
Storage space is very limited
You need perfect consistency instantly

📦 Embedding Data: Putting Things Inside Other Things

Embedding means storing related data INSIDE a document, like a nesting doll! 🪆

The Backpack Example 🎒

Your backpack doesn’t just exist by itself. It CONTAINS things:

Notebook
Pencils
Lunch box
Water bottle

In NoSQL, a “backpack” document would EMBED all these items:

{
  "backpack": {
    "color": "blue",
    "brand": "JanSport",
    "contents": [
      {"type": "notebook", "subject": "math"},
      {"type": "pencils", "count": 5},
      {"type": "lunchbox", "contents": "sandwich"},
      {"type": "water bottle", "size": "16oz"}
    ]
  }
}

One backpack = one document = one database call!

When to Embed

graph TD
    A["Should I embed?"] --> B{Is data always<br>needed together?}
    B -->|Yes!| C["✅ EMBED IT"]
    B -->|No| D{Is the data small?}
    D -->|Yes| C
    D -->|No| E[❌ Don't embed]

Perfect for embedding:

User + their shipping addresses
Blog post + its comments
Order + its line items
Product + its reviews

🔗 Referencing Data: Using Pointers Instead

Referencing means storing just a “link” to other data, like a bookmark! 📑

The Library Card Example 📚

When you borrow a book, the library doesn’t give you a COPY of their catalog. They give you a library card with a number. That number POINTS to your borrowing history.

{
  "user": {
    "name": "Emma",
    "favoriteBooks": [
      "book_id_123",
      "book_id_456",
      "book_id_789"
    ]
  }
}

To see Emma’s books, you use those IDs to look them up!

When to Reference

Scenario	Choice
Data is large (like images)	Reference ✓
Data changes often	Reference ✓
Many documents share same data	Reference ✓
Data is always needed together	Embed instead

Example: Users and Posts

Instead of embedding thousands of posts inside a user:

{
  "user": {
    "id": "user_123",
    "name": "Alex",
    "postCount": 547
  }
}

{
  "post": {
    "id": "post_789",
    "authorId": "user_123",
    "title": "My great post!",
    "content": "..."
  }
}

The post REFERENCES the user with authorId!

⚖️ Embedding vs Referencing: The Ultimate Decision

This is the BIG choice in NoSQL data modeling!

The Decision Guide

Think of it like moving to a new house:

EMBEDDING = Bringing furniture WITH you in the moving truck 🚚

Everything arrives together
One trip!
But the truck gets heavy…

REFERENCING = Sending furniture to storage, bringing address 📝

Light and fast to move
But you need another trip to get furniture
Furniture can be shared with roommates!

Quick Decision Chart

Ask yourself these questions:

1. "Do I ALWAYS need this data together?"
   → Yes = Embed
   → Sometimes = Reference

2. "Is this data HUGE (like images, files)?"
   → Yes = Reference
   → No = Can embed

3. "Does this data change A LOT?"
   → Yes = Reference
   → Rarely = Can embed

4. "Is this data shared by MANY documents?"
   → Yes = Reference
   → No = Can embed

5. "Will embedded data grow FOREVER?"
   → Yes = Reference
   → No = Can embed

Real-World Examples

Example 1: Blog System

// EMBED: Author info in each post (small, rarely changes)
{
  "post": {
    "title": "My First Post",
    "author": {
      "name": "Sam",
      "avatar": "sam.jpg"
    },
    "content": "..."
  }
}

// REFERENCE: Comments (can grow forever!)
{
  "comment": {
    "postId": "post_123",
    "text": "Great post!",
    "userId": "user_456"
  }
}

Example 2: E-Commerce

// EMBED: Order line items (always needed together)
{
  "order": {
    "orderId": "ord_789",
    "items": [
      {"product": "T-Shirt", "price": 20, "qty": 2},
      {"product": "Jeans", "price": 50, "qty": 1}
    ],
    "total": 90
  }
}

// REFERENCE: Product catalog (shared, changes often)
{
  "item": {
    "productId": "prod_123",
    "orderId": "ord_789",
    "currentPrice": 20
  }
}

🎓 Summary: Your NoSQL Data Modeling Toolkit

graph TD
    A["🎯 Start with Questions&lt;br&gt;Query-First Design"] --> B["🛤️ Map Access Patterns&lt;br&gt;How will data be used?"]
    B --> C{For each pattern...}
    C --> D["📋 Denormalize&lt;br&gt;Copy data for speed"]
    C --> E{Embed or Reference?}
    E -->|Small, together,<br>doesn't grow| F["📦 EMBED"]
    E -->|Large, shared,<br>changes often| G["🔗 REFERENCE"]
    F --> H["🚀 Fast, Simple Queries!"]
    G --> H

The Golden Rules

Think questions first, data second 🤔
One question = One database trip 🎯
Copy data when reading matters most 📋
Embed when data belongs together 📦
Reference when data is big or shared 🔗

Remember the House Analogy 🏠

Query-First: Design rooms for how you live
Access Patterns: Where do you walk most?
Denormalization: Put things where you use them
Embedding: Keep related items in same drawer
Referencing: Use labels pointing to storage

You now have the power to model data like a pro! Remember: there’s no single “right” answer—it depends on YOUR questions and YOUR app. Trust the process, start with questions, and let your access patterns guide you! 🌟

Modeling and Embedding

Unable to load concept

Coming Soon...

🏠 Data Modeling in NoSQL: Building Your Dream House

🎯 What is Data Modeling in NoSQL?

The Big Difference

🔍 Query-First Design: Start With Questions!

The Lemonade Stand Story 🍋

How to Do Query-First Design

🛤️ Access Patterns: Your Data’s Travel Routes

What Are Access Patterns?

Real Example: Social Media App

📋 Denormalization: Copying Data on Purpose!

The Pizza Menu Story 🍕

Why Copy Data?

When to Denormalize

📦 Embedding Data: Putting Things Inside Other Things

The Backpack Example 🎒

When to Embed

🔗 Referencing Data: Using Pointers Instead

The Library Card Example 📚

When to Reference

⚖️ Embedding vs Referencing: The Ultimate Decision

The Decision Guide

Quick Decision Chart

Real-World Examples

🎓 Summary: Your NoSQL Data Modeling Toolkit

The Golden Rules

Remember the House Analogy 🏠

Story - Premium Content

Stay Tuned!

Story - Premium Content

Interactive - Premium Content

Interactive - Premium Content

Stay Tuned!

Cheatsheet - Premium Content

Cheatsheet - Premium Content

Stay Tuned!

Quiz - Premium Content

Quiz - Premium Content

Stay Tuned!

Flashcard - Premium Content

Flashcard - Premium Content

Stay Tuned!

Sign in Required

Report an Issue