Resource Management

Back

Loading concept...

๐ŸŽฎ LangChain Resource Management: Keeping Your AI Wallet Happy

The Magical Piggy Bank Story ๐Ÿท

Imagine you have a super special piggy bank. Every time you ask your AI friend a question, a few coins drop out. The bigger your question, the more coins fall! Now, wouldnโ€™t it be smart to know:

  • How many coins you have left?
  • How many coins each question costs?
  • How to ask shorter questions to save coins?

Thatโ€™s exactly what Resource Management in LangChain does! It helps you count your โ€œAI coinsโ€ (called tokens) so you donโ€™t run out of money.


๐ŸชŸ Context Window Management

What is a Context Window?

Think of your AI as having a small whiteboard in its brain. This whiteboard can only hold a certain amount of writing. Once itโ€™s full, you need to erase something to write more!

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  ๐Ÿง  AI's Whiteboard (4096 tokens) โ”‚
โ”‚                                 โ”‚
โ”‚  "Hi, my name is Sam..."        โ”‚
โ”‚  "What's the weather?"          โ”‚
โ”‚  "Tell me about dogs..."        โ”‚
โ”‚  [SPACE FOR MORE] โžœ LIMITED!    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Different AI Models = Different Whiteboard Sizes

Model Context Window Like Havingโ€ฆ
GPT-3.5 4,096 tokens Small notebook
GPT-4 8,192 tokens Medium notebook
GPT-4-32k 32,768 tokens Big textbook
GPT-4-128k 128,000 tokens Entire library!

Smart Ways to Manage Your Context

1. Trim Old Messages ๐Ÿ—‘๏ธ

# Keep only the last 10 messages
messages = messages[-10:]

2. Summarize Long Conversations ๐Ÿ“

# Instead of keeping ALL messages
# Create a summary of what was said
summary = "User asked about dogs. AI explained breeds."

3. Use Message Windows ๐ŸชŸ

from langchain.memory import ConversationBufferWindowMemory

# Only remember last 5 exchanges
memory = ConversationBufferWindowMemory(k=5)

๐Ÿ”ข Token Counting

Whatโ€™s a Token?

Tokens are like word puzzle pieces. One word might be 1 piece, or it might be broken into 2-3 pieces!

"Hello"     โ†’ 1 token  โœ“
"ChatGPT"   โ†’ 2 tokens (Chat + GPT)
"๐ŸŽ‰"        โ†’ 1 token
"Pneumonia" โ†’ 3 tokens (Pneu + mon + ia)

Counting Tokens in LangChain

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4")

# Count tokens in a message
text = "Hello, how are you today?"
token_count = llm.get_num_tokens(text)

print(f"This costs {token_count} tokens!")
# Output: This costs 6 tokens!

Using tiktoken (OpenAIโ€™s Counter)

import tiktoken

# Get the encoder for your model
encoder = tiktoken.encoding_for_model("gpt-4")

# Count tokens
text = "LangChain is awesome!"
tokens = encoder.encode(text)

print(f"Token count: {len(tokens)}")
# Output: Token count: 4

Token Counting Tips ๐Ÿ’ก

graph TD A["Your Message"] --> B{Count Tokens} B --> C["Short? โœ… Send it!"] B --> D["Too Long? โœ‚๏ธ Trim it!"] D --> E["Remove Extra Words"] D --> F["Split into Parts"] D --> G["Summarize First"]

๐Ÿ’ฐ get_openai_callback: Your Token Meter

The Magic Counter

get_openai_callback is like having a smart meter that watches every coin leaving your piggy bank!

from langchain_openai import ChatOpenAI
from langchain_community.callbacks import get_openai_callback

llm = ChatOpenAI(model="gpt-4")

# Start the meter! ๐Ÿ“Š
with get_openai_callback() as cb:
    # Ask your question
    response = llm.invoke("What is Python?")

    # Check the meter!
    print(f"Tokens used: {cb.total_tokens}")
    print(f"Cost: ${cb.total_cost:.4f}")

What the Callback Tells You

with get_openai_callback() as cb:
    response = llm.invoke("Explain AI simply")

    # ๐Ÿ“ฅ Tokens you sent (your question)
    print(f"Prompt tokens: {cb.prompt_tokens}")

    # ๐Ÿ“ค Tokens AI sent back (the answer)
    print(f"Completion tokens: {cb.completion_tokens}")

    # ๐Ÿ“Š Total tokens used
    print(f"Total tokens: {cb.total_tokens}")

    # ๐Ÿ’ต Money spent
    print(f"Total cost: ${cb.total_cost:.6f}")

    # ๐Ÿ”„ How many API calls made
    print(f"API calls: {cb.successful_requests}")

Real Output Example

Prompt tokens: 12
Completion tokens: 85
Total tokens: 97
Total cost: $0.002910
API calls: 1

Tracking Multiple Calls

with get_openai_callback() as cb:
    # First question
    llm.invoke("What is 2+2?")

    # Second question
    llm.invoke("What is the capital of France?")

    # Third question
    llm.invoke("Why is the sky blue?")

    # Total for ALL three questions!
    print(f"Total for 3 questions: {cb.total_tokens} tokens")
    print(f"Total cost: ${cb.total_cost:.4f}")

๐Ÿ’ธ Cost Management

Understanding AI Pricing

Different models cost different amounts per token:

Model Input Cost Output Cost
GPT-3.5-turbo $0.0005/1K $0.0015/1K
GPT-4 $0.03/1K $0.06/1K
GPT-4-turbo $0.01/1K $0.03/1K

๐Ÿงฎ 1K = 1,000 tokens

Building a Budget Guardian

class BudgetGuardian:
    def __init__(self, max_budget=1.00):
        self.max_budget = max_budget
        self.spent = 0.0

    def can_afford(self, estimated_cost):
        return (self.spent + estimated_cost) <= self.max_budget

    def record_spending(self, cost):
        self.spent += cost
        remaining = self.max_budget - self.spent
        print(f"๐Ÿ’ฐ Spent: ${cost:.4f}")
        print(f"๐Ÿ“Š Remaining: ${remaining:.4f}")

# Usage
guardian = BudgetGuardian(max_budget=0.50)

with get_openai_callback() as cb:
    response = llm.invoke("Quick question")
    guardian.record_spending(cb.total_cost)

Cost-Saving Strategies

graph LR A["Save Money! ๐Ÿ’ฐ"] --> B["Use Cheaper Models"] A --> C["Write Shorter Prompts"] A --> D["Cache Common Answers"] A --> E["Set Token Limits"] B --> B1["GPT-3.5 for simple tasks"] C --> C1["Remove filler words"] D --> D1[Don't ask same thing twice] E --> E1["max_tokens parameter"]

Limiting Response Length

from langchain_openai import ChatOpenAI

# Limit AI's response to 100 tokens max
llm = ChatOpenAI(
    model="gpt-3.5-turbo",
    max_tokens=100  # Short answers = less cost!
)

Caching to Save Money

from langchain.globals import set_llm_cache
from langchain.cache import InMemoryCache

# Turn on caching
set_llm_cache(InMemoryCache())

# First call - costs money
response1 = llm.invoke("What is Python?")

# Second SAME call - FREE from cache! ๐ŸŽ‰
response2 = llm.invoke("What is Python?")

Complete Cost Monitoring Example

from langchain_openai import ChatOpenAI
from langchain_community.callbacks import get_openai_callback

def smart_query(question, budget_limit=0.10):
    llm = ChatOpenAI(model="gpt-3.5-turbo")

    with get_openai_callback() as cb:
        response = llm.invoke(question)

        # Check if over budget
        if cb.total_cost > budget_limit:
            print("โš ๏ธ Warning: Over budget!")

        # Return response with cost info
        return {
            "answer": response.content,
            "cost": cb.total_cost,
            "tokens": cb.total_tokens
        }

# Use it!
result = smart_query("Explain gravity simply")
print(f"Answer: {result['answer'][:50]}...")
print(f"Cost: ${result['cost']:.4f}")

๐ŸŽฏ Quick Summary

What Why How
Context Window AI has limited memory Trim, summarize, use windows
Token Counting Know what youโ€™re spending get_num_tokens(), tiktoken
get_openai_callback Track every penny with get_openai_callback() as cb:
Cost Management Stay within budget Limits, caching, cheaper models

๐ŸŒŸ Golden Rules

  1. Always wrap calls in get_openai_callback() to track costs
  2. Use GPT-3.5 for simple tasks, GPT-4 only when needed
  3. Set max_tokens to prevent runaway costs
  4. Cache repeated questions - never pay twice!
  5. Monitor your context window - trim old messages

Remember: Every token counts! Be a smart AI spender! ๐Ÿท๐Ÿ’ฐ

Loading story...

Story - Premium Content

Please sign in to view this story and start learning.

Upgrade to Premium to unlock full access to all stories.

Stay Tuned!

Story is coming soon.

Story Preview

Story - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.