๐ฎ LangChain Resource Management: Keeping Your AI Wallet Happy
The Magical Piggy Bank Story ๐ท
Imagine you have a super special piggy bank. Every time you ask your AI friend a question, a few coins drop out. The bigger your question, the more coins fall! Now, wouldnโt it be smart to know:
- How many coins you have left?
- How many coins each question costs?
- How to ask shorter questions to save coins?
Thatโs exactly what Resource Management in LangChain does! It helps you count your โAI coinsโ (called tokens) so you donโt run out of money.
๐ช Context Window Management
What is a Context Window?
Think of your AI as having a small whiteboard in its brain. This whiteboard can only hold a certain amount of writing. Once itโs full, you need to erase something to write more!
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ๐ง AI's Whiteboard (4096 tokens) โ
โ โ
โ "Hi, my name is Sam..." โ
โ "What's the weather?" โ
โ "Tell me about dogs..." โ
โ [SPACE FOR MORE] โ LIMITED! โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Different AI Models = Different Whiteboard Sizes
| Model | Context Window | Like Havingโฆ |
|---|---|---|
| GPT-3.5 | 4,096 tokens | Small notebook |
| GPT-4 | 8,192 tokens | Medium notebook |
| GPT-4-32k | 32,768 tokens | Big textbook |
| GPT-4-128k | 128,000 tokens | Entire library! |
Smart Ways to Manage Your Context
1. Trim Old Messages ๐๏ธ
# Keep only the last 10 messages
messages = messages[-10:]
2. Summarize Long Conversations ๐
# Instead of keeping ALL messages
# Create a summary of what was said
summary = "User asked about dogs. AI explained breeds."
3. Use Message Windows ๐ช
from langchain.memory import ConversationBufferWindowMemory
# Only remember last 5 exchanges
memory = ConversationBufferWindowMemory(k=5)
๐ข Token Counting
Whatโs a Token?
Tokens are like word puzzle pieces. One word might be 1 piece, or it might be broken into 2-3 pieces!
"Hello" โ 1 token โ
"ChatGPT" โ 2 tokens (Chat + GPT)
"๐" โ 1 token
"Pneumonia" โ 3 tokens (Pneu + mon + ia)
Counting Tokens in LangChain
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4")
# Count tokens in a message
text = "Hello, how are you today?"
token_count = llm.get_num_tokens(text)
print(f"This costs {token_count} tokens!")
# Output: This costs 6 tokens!
Using tiktoken (OpenAIโs Counter)
import tiktoken
# Get the encoder for your model
encoder = tiktoken.encoding_for_model("gpt-4")
# Count tokens
text = "LangChain is awesome!"
tokens = encoder.encode(text)
print(f"Token count: {len(tokens)}")
# Output: Token count: 4
Token Counting Tips ๐ก
graph TD A["Your Message"] --> B{Count Tokens} B --> C["Short? โ Send it!"] B --> D["Too Long? โ๏ธ Trim it!"] D --> E["Remove Extra Words"] D --> F["Split into Parts"] D --> G["Summarize First"]
๐ฐ get_openai_callback: Your Token Meter
The Magic Counter
get_openai_callback is like having a smart meter that watches every coin leaving your piggy bank!
from langchain_openai import ChatOpenAI
from langchain_community.callbacks import get_openai_callback
llm = ChatOpenAI(model="gpt-4")
# Start the meter! ๐
with get_openai_callback() as cb:
# Ask your question
response = llm.invoke("What is Python?")
# Check the meter!
print(f"Tokens used: {cb.total_tokens}")
print(f"Cost: ${cb.total_cost:.4f}")
What the Callback Tells You
with get_openai_callback() as cb:
response = llm.invoke("Explain AI simply")
# ๐ฅ Tokens you sent (your question)
print(f"Prompt tokens: {cb.prompt_tokens}")
# ๐ค Tokens AI sent back (the answer)
print(f"Completion tokens: {cb.completion_tokens}")
# ๐ Total tokens used
print(f"Total tokens: {cb.total_tokens}")
# ๐ต Money spent
print(f"Total cost: ${cb.total_cost:.6f}")
# ๐ How many API calls made
print(f"API calls: {cb.successful_requests}")
Real Output Example
Prompt tokens: 12
Completion tokens: 85
Total tokens: 97
Total cost: $0.002910
API calls: 1
Tracking Multiple Calls
with get_openai_callback() as cb:
# First question
llm.invoke("What is 2+2?")
# Second question
llm.invoke("What is the capital of France?")
# Third question
llm.invoke("Why is the sky blue?")
# Total for ALL three questions!
print(f"Total for 3 questions: {cb.total_tokens} tokens")
print(f"Total cost: ${cb.total_cost:.4f}")
๐ธ Cost Management
Understanding AI Pricing
Different models cost different amounts per token:
| Model | Input Cost | Output Cost |
|---|---|---|
| GPT-3.5-turbo | $0.0005/1K | $0.0015/1K |
| GPT-4 | $0.03/1K | $0.06/1K |
| GPT-4-turbo | $0.01/1K | $0.03/1K |
๐งฎ 1K = 1,000 tokens
Building a Budget Guardian
class BudgetGuardian:
def __init__(self, max_budget=1.00):
self.max_budget = max_budget
self.spent = 0.0
def can_afford(self, estimated_cost):
return (self.spent + estimated_cost) <= self.max_budget
def record_spending(self, cost):
self.spent += cost
remaining = self.max_budget - self.spent
print(f"๐ฐ Spent: ${cost:.4f}")
print(f"๐ Remaining: ${remaining:.4f}")
# Usage
guardian = BudgetGuardian(max_budget=0.50)
with get_openai_callback() as cb:
response = llm.invoke("Quick question")
guardian.record_spending(cb.total_cost)
Cost-Saving Strategies
graph LR A["Save Money! ๐ฐ"] --> B["Use Cheaper Models"] A --> C["Write Shorter Prompts"] A --> D["Cache Common Answers"] A --> E["Set Token Limits"] B --> B1["GPT-3.5 for simple tasks"] C --> C1["Remove filler words"] D --> D1[Don't ask same thing twice] E --> E1["max_tokens parameter"]
Limiting Response Length
from langchain_openai import ChatOpenAI
# Limit AI's response to 100 tokens max
llm = ChatOpenAI(
model="gpt-3.5-turbo",
max_tokens=100 # Short answers = less cost!
)
Caching to Save Money
from langchain.globals import set_llm_cache
from langchain.cache import InMemoryCache
# Turn on caching
set_llm_cache(InMemoryCache())
# First call - costs money
response1 = llm.invoke("What is Python?")
# Second SAME call - FREE from cache! ๐
response2 = llm.invoke("What is Python?")
Complete Cost Monitoring Example
from langchain_openai import ChatOpenAI
from langchain_community.callbacks import get_openai_callback
def smart_query(question, budget_limit=0.10):
llm = ChatOpenAI(model="gpt-3.5-turbo")
with get_openai_callback() as cb:
response = llm.invoke(question)
# Check if over budget
if cb.total_cost > budget_limit:
print("โ ๏ธ Warning: Over budget!")
# Return response with cost info
return {
"answer": response.content,
"cost": cb.total_cost,
"tokens": cb.total_tokens
}
# Use it!
result = smart_query("Explain gravity simply")
print(f"Answer: {result['answer'][:50]}...")
print(f"Cost: ${result['cost']:.4f}")
๐ฏ Quick Summary
| What | Why | How |
|---|---|---|
| Context Window | AI has limited memory | Trim, summarize, use windows |
| Token Counting | Know what youโre spending | get_num_tokens(), tiktoken |
| get_openai_callback | Track every penny | with get_openai_callback() as cb: |
| Cost Management | Stay within budget | Limits, caching, cheaper models |
๐ Golden Rules
- Always wrap calls in
get_openai_callback()to track costs - Use GPT-3.5 for simple tasks, GPT-4 only when needed
- Set
max_tokensto prevent runaway costs - Cache repeated questions - never pay twice!
- Monitor your context window - trim old messages
Remember: Every token counts! Be a smart AI spender! ๐ท๐ฐ
