Short-Term Conversation Memory: Building Context-Aware AI Agents

Michael BrenndoerferJuly 1, 202515 min read

Learn how to give AI agents the ability to remember recent conversations, handle follow-up questions, and manage conversation history across multiple interactions.

Short-Term Conversation Memory

You've built an agent that can reason, use tools, and solve complex problems. But there's a fundamental limitation: it forgets everything after each interaction. Ask it a question, get an answer, then ask a follow-up, and it has no idea what you're talking about.

Imagine calling a help desk where the representative forgets your conversation every 30 seconds. You'd have to re-explain your problem constantly. Frustrating, right? That's exactly what happens with an agent that lacks memory.

In this chapter, we'll give your personal assistant the ability to remember recent conversations. You'll learn how to maintain context across multiple interactions, handle follow-up questions, and manage conversation history as it grows. By the end, your agent will feel less like a stateless question-answering machine and more like an assistant that actually remembers what you've been discussing.

The Problem: Stateless Interactions

Let's see what happens without memory. Here's a conversation with our agent:

User: What's the capital of France?
Agent: The capital of France is Paris.

User: What's the population?
Agent: I need more context. What location are you asking about?

The agent answered the first question perfectly. But when you asked a follow-up ("What's the population?"), it had no memory of discussing France. Each interaction is isolated, like talking to someone with amnesia.

This breaks down quickly in real conversations. People naturally ask follow-up questions, refer to previous topics, and build on earlier context. Without memory, your agent can't handle these basic conversational patterns.

Example (Claude Sonnet 4.5)

Let's see this in code:

In[3]:
Code
import os
from anthropic import Anthropic

## Using Claude Sonnet 4.5 for its excellent conversational capabilities
client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))

def ask_without_memory(question: str) -> str:
    """Ask a question without any conversation history."""
    response = client.messages.create(
        model="claude-sonnet-4-5",
        max_tokens=1024,
        messages=[
            {"role": "user", "content": question}
        ]
    )
    return response.content[0].text

## First question
print("Q1:", ask_without_memory("What's the capital of France?"))
## Output: "The capital of France is Paris."

## Follow-up question
print("Q2:", ask_without_memory("What's the population?"))
## Output: "I'd be happy to help with population information, 
## but I need to know which location you're asking about."
Out[3]:
Console
Q1: The capital of France is Paris.
Q2: I'd be happy to help you with population information! Could you please specify which location you're asking about? For example:

- A specific city?
- A country?
- The world?
- Some other area?

Each call to ask_without_memory creates a fresh conversation. The agent has no context from previous questions. This is the default behavior when you don't explicitly manage conversation history.

The Solution: Conversation History

The fix is straightforward: keep track of the conversation and send it with each new message. Instead of just sending the latest question, you send the entire dialogue history.

Here's what that looks like:

In[4]:
Code
def ask_with_memory(conversation_history: list, question: str) -> tuple:
    """
    Ask a question with full conversation context.
    
    Args:
        conversation_history: List of previous messages
        question: The new question to ask
    
    Returns:
        Tuple of (answer, updated_history)
    """
    # Add the new question to history
    conversation_history.append({
        "role": "user",
        "content": question
    })
    
    # Send entire conversation to the model
    response = client.messages.create(
        model="claude-sonnet-4-5",
        max_tokens=1024,
        messages=conversation_history
    )
    
    # Extract the answer
    answer = response.content[0].text
    
    # Add the answer to history
    conversation_history.append({
        "role": "assistant",
        "content": answer
    })
    
    return answer, conversation_history

Now let's try the same conversation with memory:

In[5]:
Code
## Start with empty history
history = []

## First question
answer1, history = ask_with_memory(history, "What's the capital of France?")
print("Q1:", answer1)
## Output: "The capital of France is Paris."

## Follow-up question - now with context!
answer2, history = ask_with_memory(history, "What's the population?")
print("Q2:", answer2)
## Output: "Paris has a population of approximately 2.1 million people 
## within the city proper, and about 12 million in the greater 
## metropolitan area."
Out[5]:
Console
Q1: The capital of France is Paris.
Q2: The population of Paris is approximately 2.2 million people within the city limits (as of recent estimates).

However, the Paris metropolitan area (Île-de-France region) has a much larger population of around 12-13 million people, making it one of the largest metropolitan areas in Europe.

The agent understood "the population" refers to Paris because it remembered the previous exchange. This is the foundation of conversational AI: maintaining context across turns.

How Conversation History Works

Let's look at what's actually happening behind the scenes. After the two-question exchange above, our history list contains:

In[6]:
Code
[
    {"role": "user", "content": "What's the capital of France?"},
    {"role": "assistant", "content": "The capital of France is Paris."},
    {"role": "user", "content": "What's the population?"},
    {"role": "assistant", "content": "Paris has a population of approximately..."}
]
Out[6]:
Console
[{'role': 'user', 'content': "What's the capital of France?"},
 {'role': 'assistant', 'content': 'The capital of France is Paris.'},
 {'role': 'user', 'content': "What's the population?"},
 {'role': 'assistant',
  'content': 'Paris has a population of approximately...'}]

Each message is stored with its role (either "user" or "assistant") and content. When you ask a new question, the entire list goes to the model. The model sees the full conversation thread and can reference earlier messages.

Think of it like showing someone a chat transcript. They can read the whole conversation and understand the context of the latest message. That's exactly what we're doing with the language model.

Building a Conversational Agent

Let's create a more complete conversational agent that manages memory automatically:

In[7]:
Code
class ConversationalAgent:
    """
    A personal assistant that remembers the conversation.
    """
    
    def __init__(self):
        self.client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
        self.conversation_history = []
        
    def chat(self, user_message: str) -> str:
        """
        Send a message and get a response, maintaining conversation history.
        
        Args:
            user_message: What the user wants to say
        
        Returns:
            The agent's response
        """
        # Add user message to history
        self.conversation_history.append({
            "role": "user",
            "content": user_message
        })
        
        # Get response with full context
        response = self.client.messages.create(
            model="claude-sonnet-4-5",
            max_tokens=1024,
            messages=self.conversation_history
        )
        
        # Extract and store the response
        assistant_message = response.content[0].text
        self.conversation_history.append({
            "role": "assistant",
            "content": assistant_message
        })
        
        return assistant_message
    
    def clear_history(self):
        """Start a fresh conversation."""
        self.conversation_history = []
    
    def get_history(self) -> list:
        """Get the current conversation history."""
        return self.conversation_history.copy()

Now we can have natural, multi-turn conversations:

In[8]:
Code
## Create the agent
agent = ConversationalAgent()

## Have a conversation
print(agent.chat("I'm planning a trip to Japan."))
## "That sounds exciting! Japan is a wonderful destination..."

print(agent.chat("What's the best time of year to visit?"))
## "For Japan, the best times to visit are typically spring (March-May) 
## for cherry blossoms, or autumn (September-November) for fall foliage..."

print(agent.chat("How long should I plan to stay?"))
## "For a first trip to Japan, I'd recommend at least 10-14 days..."

print(agent.chat("What cities should I include?"))
## "Based on your trip to Japan, I'd suggest including Tokyo, Kyoto, 
## and possibly Osaka or Hiroshima..."
Out[8]:
Console
# How exciting! I'd be happy to help you plan your trip to Japan.

To give you the best suggestions, it would be helpful to know more about your plans:

- **When are you planning to go?** (Season can greatly affect your experience - cherry blossoms in spring, fall foliage, winter snow, summer festivals)
- **How long** will you be visiting?
- **What are your interests?** (temples/culture, food, nature, cities, anime/pop culture, hiking, etc.)
- **First time to Japan** or returning?
- **Which cities/regions** are you considering?

In the meantime, here are some general tips:

**Popular first-timer destinations:**
- Tokyo (modern city life)
- Kyoto (traditional culture, temples)
- Osaka (food scene)
- Hiroshima & Miyajima
- Hakone or nearby Mt. Fuji areas

**Practical considerations:**
- JR Pass can save money if traveling between cities
- Pocket WiFi or SIM card is very useful
- IC cards (Suica/Pasmo) make transport easy
- Learn a few basic Japanese phrases

What aspects of your trip would you like help with?
# Best Times to Visit Japan

The "best" time really depends on what you're looking for, but here's a breakdown:

## 🌸 **Spring (March-May) - Most Popular**
**Pros:**
- Cherry blossoms (late March-early April) - truly spectacular
- Comfortable temperatures (15-20°C / 59-68°F)
- Clear skies, low rainfall

**Cons:**
- Very crowded and expensive during sakura season
- Book accommodations months in advance
- Popular spots can feel overwhelming

## 🍁 **Autumn (September-November) - Also Peak Season**
**Pros:**
- Fall foliage (mid-November) is stunning
- Pleasant weather (15-22°C / 59-72°F)
- Comfortable for walking/sightseeing
- Many festivals

**Cons:**
- Crowded, especially in Kyoto during koyo (leaf viewing)
- Higher prices
- Early September can still be hot/humid

## ❄️ **Winter (December-February) - Underrated**
**Pros:**
- Fewer tourists, lower prices
- Excellent skiing in Hokkaido/Nagano
- Beautiful snow scenery
- Clear skies (especially in Tokyo)

**Cons:**
- Cold (0-10°C / 32-50°F)
- Some attractions close
- Shorter daylight hours

## ☀️ **Summer (June-August) - Least Recommended**
**Pros:**
- Vibrant festivals (fireworks, bon odori)
- Summer hiking season in mountains
- Lavender fields in Hokkaido

**Cons:**
- Hot and very humid (25-35°C / 77-95°F)
- Rainy season in June-July
- Typhoon season (Aug-Sept)
- Can be uncomfortable for sightseeing

## My Recommendation:
**Late March-April or November** for the best balance of weather and scenery, but be prepared for crowds. **Winter (Jan-Feb)** if you want a more budget-friendly, authentic experience.

What type of experience appeals to you most?
# How Long to Stay in Japan

Here's my breakdown based on different trip lengths:

## ⏱️ **Minimum: 7-10 Days**
Best for first-timers with limited time
- **Tokyo**: 3-4 days
- **Kyoto**: 2-3 days  
- **Osaka**: 1-2 days
- Day trip to Nara or Hakone

This gives you a taste of modern Japan (Tokyo), traditional culture (Kyoto), and great food (Osaka).

## 👍 **Ideal: 2 Weeks (14 days)**
The sweet spot for most visitors
- **Tokyo**: 4-5 days (including day trips to Nikko, Kamakura, or Mt. Fuji area)
- **Kyoto**: 3-4 days (including day trip to Nara)
- **Osaka**: 2 days
- **Hiroshima**: 1-2 days (including Miyajima island)
- **Plus** one wild card: Takayama, Kanazawa, Hakone, or Nagano

This allows you to experience diverse regions without feeling rushed.

## 🌟 **Extended: 3-4 Weeks**
For thorough exploration
- All of the above PLUS:
- **Hokkaido** (Sapporo, Hakodate)
- **Japanese Alps** region
- **Okinawa** (tropical islands)
- More rural/off-beaten-path areas
- Time to slow down and absorb local life

## 💡 **My Recommendation:**

**Aim for at least 10-14 days** if possible. Japan has:
- Excellent public transportation (easy to cover ground)
- Enough variety that you won't get bored
- A learning curve (language, customs) that takes a few days to adjust to

**Quality over quantity:** It's better to spend 3 days truly experiencing Kyoto than rushing through in 1 day to check it off a list.

How much time do you have available for your trip?
# Essential Cities & Regions for Your Japan Itinerary

Here's my guide organized by priority:

## 🎯 **The Classic "Golden Route" (Must-See for First-Timers)**

### **Tokyo** (4-5 days)
- Modern metropolis, endless neighborhoods to explore
- Shibuya, Shinjuku, Harajuku (urban energy)
- Asakusa (traditional temples)
- Akihabara (anime/electronics)
- Tsukiji Outer Market (food)
- **Day trips:** Mt. Fuji/Hakone, Nikko, Kamakura

### **Kyoto** (3-4 days)
- Japan's cultural heart, 2,000+ temples
- Fushimi Inari (iconic red torii gates)
- Arashiyama bamboo grove
- Kinkaku-ji (Golden Pavilion)
- Gion district (geisha spotting)
- **Easy day trip:** Nara (deer park, giant Buddha - do this!)

### **Osaka** (1-2 days)
- Food capital of Japan
- Street food in Dotonbori
- Osaka Castle
- More casual, fun vibe than Tokyo
- Great nightlife

## 🔥 **Highly Recommended Add-Ons**

### **Hiroshima** (1-2 days)
- Peace Memorial & Museum (moving experience)
- **Miyajima Island** - one of Japan's most scenic spots (floating torii gate)

### **Hakone** (1-2 days)
- Mt. Fuji views
- Hot springs (onsen)
- Art museums
- Scenic nature
- Easy from Tokyo

## 🌟 **If You Have Extra Time (Pick 1-2)**

### **Takayama & Shirakawa-go**
- Traditional mountain villages
- Preserved Edo-period streets
- UNESCO World Heritage gassho-zukuri houses

### **Kanazawa**
- Beautiful Kenrokuen Garden
- Samurai & geisha districts
- Excellent seafood
- Less touristy alternative to Kyoto

### **Nara** 
- Can be a day trip from Kyoto/Osaka
- Friendly deer roaming freely
- Todai-ji Temple (massive Buddha)

### **Nagano**
- Snow monkeys in hot springs
- Mountain temples
- Winter sports

### **Hokkaido** (Sapporo, Hakodate)
- Northern island - very different feel
- Best in winter (skiing, snow festival) or summer (lavender)
- Needs 3-5 days minimum

### **Okinawa**
- Tropical islands, beaches
- Distinct Ryukyu culture
- Requires flight, feels like different country

## 📋 **Sample Itineraries**

### **10 Days:**
- Tokyo (4 days) → Hakone (1 day) → Kyoto (3 days, with Nara day trip) → Osaka (2 days)

### **14 Days:**
- Tokyo (4 days) → Hakone (1 day) → Kyoto (3 days) → Nara (1 day) → Osaka (2 days) → Hiroshima/Miyajima (2 days) → back to Tokyo or add Takayama (1 day)

### **3 Weeks:**
- All of the above + Kanazawa (2 days) + Nagano (2 days) + Hokkaido (4-5 days) OR Okinawa (3-4 days)

## 💡 **My Advice:**

**Don't try to see everything!** Japan rewards depth over breadth. The classic Tokyo-Kyoto-Osaka triangle is perfect for a first visit. You can always return (and you'll want to!).

**Geographic logic matters:** Plan your route to avoid backtracking. Generally flow: Tokyo → Central Japan → Kyoto/Osaka → Western Japan (Hiroshima).

What's your trip length looking like? That will help me suggest the best combination!

Notice how each response builds on the previous context. The agent remembers we're discussing a Japan trip and tailors its answers accordingly.

Handling Follow-Up Questions

One of the most powerful aspects of conversation memory is handling follow-ups. People rarely ask perfectly self-contained questions. They say "What about that?", "Can you explain more?", or "Why is that?"

Let's see this in action:

In[9]:
Code
agent = ConversationalAgent()

## Initial question
print(agent.chat("What are the main differences between Python and JavaScript?"))
## Detailed comparison of Python and JavaScript...

## Vague follow-up
print(agent.chat("Which one is easier to learn?"))
## "For beginners, Python is generally considered easier to learn..."

## Another vague follow-up
print(agent.chat("Why is that?"))
## "Python is easier to learn because of its clean, readable syntax..."

## Reference to earlier point
print(agent.chat("You mentioned async programming. Can you explain that more?"))
## "In JavaScript, async programming is central to how the language works..."
Out[9]:
Console
# Main Differences Between Python and JavaScript

## **1. Primary Use Cases**
- **Python**: General-purpose programming, data science, AI/ML, automation, backend development
- **JavaScript**: Web development (frontend), also backend (Node.js), interactive web pages

## **2. Execution Environment**
- **Python**: Runs on servers, desktops, requires Python interpreter
- **JavaScript**: Runs natively in web browsers, also on servers via Node.js

## **3. Syntax Style**
- **Python**: Uses indentation for code blocks, more readable
```python
if x > 5:
    print("Greater than 5")
```
- **JavaScript**: Uses curly braces and semicolons
```javascript
if (x > 5) {
    console.log("Greater than 5");
}
```

## **4. Typing**
- **Python**: Dynamically typed, supports optional type hints
- **JavaScript**: Dynamically typed, can use TypeScript for static typing

## **5. Data Structures**
- **Python**: Lists, tuples, dictionaries, sets
- **JavaScript**: Arrays, objects (no native tuples/sets until ES6)

## **6. Object-Oriented Programming**
- **Python**: Class-based with clear inheritance
- **JavaScript**: Prototype-based (though ES6+ added class syntax)

## **7. Concurrency**
- **Python**: Threading, multiprocessing, asyncio
- **JavaScript**: Event-driven, single-threaded with async/await and callbacks

Both are powerful, versatile languages—choose based on your project needs!
# Which Is Easier to Learn?

## **Python is Generally Considered Easier** 🐍

### **Why Python Wins for Beginners:**

**1. Simpler, Cleaner Syntax**
- Reads almost like English
- No curly braces or semicolons to remember
- Indentation forces clean, readable code

```python
# Python - straightforward
name = "Alice"
if name == "Alice":
    print("Hello Alice!")
```

```javascript
// JavaScript - more symbols
let name = "Alice";
if (name === "Alice") {
    console.log("Hello Alice!");
}
```

**2. Fewer Quirks**
- More predictable behavior
- Less confusing concepts (like `this`, hoisting, `==` vs `===`)

**3. One Clear Way to Do Things**
- Python philosophy: "There should be one obvious way to do it"
- JavaScript has multiple paradigms that can confuse beginners

**4. Better for Learning Programming Concepts**
- Used widely in education
- Great for algorithms, data structures, and general CS concepts

## **JavaScript Has Its Advantages:**

✅ Instant gratification (see results in browser immediately)  
✅ No installation needed (just open browser console)  
✅ Essential if you want to build websites

## **Verdict:**
**Start with Python** if you're learning programming fundamentals. Switch to JavaScript when you're ready for web development. Many developers learn both eventually!
# Why Is Python Easier to Learn?

Let me break down the specific reasons:

## **1. Less Cognitive Load**

**Python:**
```python
# Clean and minimal
numbers = [1, 2, 3, 4, 5]
for num in numbers:
    print(num * 2)
```

**JavaScript:**
```javascript
// More syntax to remember
const numbers = [1, 2, 3, 4, 5];
numbers.forEach(function(num) {
    console.log(num * 2);
});
```

Python requires remembering fewer symbols and conventions.

## **2. JavaScript's Confusing Behaviors**

**Type coercion surprises:**
```javascript
"5" + 3      // "53" (string)
"5" - 3      // 2 (number) - WHY?!
[] + []      // "" (empty string)
[] + {}      // "[object Object]"
```

**Multiple equality operators:**
```javascript
0 == "0"     // true (weird!)
0 === "0"    // false (expected)
```

Python is more predictable and consistent.

## **3. Explicit vs Implicit**

- **Python**: Explicit is better than implicit (Zen of Python)
- **JavaScript**: Lots of implicit behavior (automatic semicolons, type conversion, hoisting)

## **4. Focused Learning Path**

- **Python**: One obvious way → easier decisions for beginners
- **JavaScript**: Multiple ways (callbacks, promises, async/await, classes, prototypes) → overwhelming choices

## **The Bottom Line:**
Python was **designed for readability and simplicity**. JavaScript evolved from a quick browser scripting language and accumulated complexity over 25+ years. Both are powerful, but Python's philosophy makes it friendlier for newcomers.
# Async Programming Explained

## **What Is Async Programming?**

**Asynchronous** programming lets your code do multiple things without waiting for slow operations to finish.

### **Real-World Analogy:**

**Synchronous (blocking):**
- You order coffee ☕
- Stand there waiting while it's made ⏳
- Only then can you order a bagel 🥯
- Wait again...

**Asynchronous (non-blocking):**
- Order coffee ☕
- While it's being made, order a bagel 🥯
- While both are being made, sit down and check your phone 📱
- Get notified when each is ready ✅

## **Why Do We Need It?**

For slow operations like:
- Fetching data from websites/APIs
- Reading files
- Database queries
- Waiting for user input

Without async, your entire program freezes!

## **Python Example:**

```python
import asyncio

# Synchronous - takes 6 seconds total
def make_coffee():
    time.sleep(3)  # Wait 3 seconds
    return "Coffee ready"

def make_toast():
    time.sleep(3)  # Wait 3 seconds
    return "Toast ready"

# Asynchronous - takes 3 seconds total!
async def make_coffee_async():
    await asyncio.sleep(3)
    return "Coffee ready"

async def make_toast_async():
    await asyncio.sleep(3)
    return "Toast ready"

async def make_breakfast():
    # Both happen at the same time!
    coffee, toast = await asyncio.gather(
        make_coffee_async(),
        make_toast_async()
    )
    print(coffee, toast)
```

## **JavaScript Example:**

```javascript
// Old way - Callback Hell
fetchUser(function(user) {
    fetchPosts(user.id, function(posts) {
        fetchComments(posts[0].id, function(comments) {
            // Finally do something...
        });
    });
});

// Modern way - async/await (much cleaner!)
async function getUserData() {
    const user = await fetchUser();
    const posts = await fetchPosts(user.id);
    const comments = await fetchComments(posts[0].id);
    return comments;
}
```

## **Key Concepts:**

🔹 **await** - "Pause here until this finishes, but let other code run"  
🔹 **async** - "This function uses await inside"  
🔹 **Promise** (JS) - "I promise to give you a result later"  
🔹 **Non-blocking** - Other code keeps running while waiting

## **When to Use Async:**

✅ API calls  
✅ File operations  
✅ Database queries  
✅ Any I/O (input/output) operations  

❌ Heavy calculations (use different techniques like threading)

Async makes your programs faster and more responsive!

Each follow-up question would be impossible to answer without context. "Which one" refers to Python and JavaScript. "Why is that" refers to Python being easier. "You mentioned" explicitly references the earlier response.

The agent handles all of this naturally because it has the full conversation in memory.

Memory with Tool Use

Memory becomes even more important when your agent uses tools. The agent needs to remember:

  • What tools it has called
  • What results it received
  • How those results relate to the user's questions

Let's extend our conversational agent to support tools:

In[10]:
Code
def calculate(expression: str) -> dict:
    """Simple calculator tool."""
    try:
        result = eval(expression)
        return {"success": True, "result": result}
    except Exception as e:
        return {"success": False, "error": str(e)}

calculator_tool = {
    "name": "calculate",
    "description": "Perform mathematical calculations",
    "input_schema": {
        "type": "object",
        "properties": {
            "expression": {
                "type": "string",
                "description": "Math expression to evaluate"
            }
        },
        "required": ["expression"]
    }
}

class ConversationalAgentWithTools:
    """Agent with memory and tool use."""
    
    def __init__(self):
        self.client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
        self.conversation_history = []
        self.tools = {"calculate": calculate}
    
    def chat(self, user_message: str) -> str:
        """Chat with tool use support."""
        # Add user message
        self.conversation_history.append({
            "role": "user",
            "content": user_message
        })
        
        # Get response
        response = self.client.messages.create(
            model="claude-sonnet-4-5",
            max_tokens=1024,
            tools=[calculator_tool],
            messages=self.conversation_history
        )
        
        # Handle tool use if needed
        if response.stop_reason == "tool_use":
            # Extract tool call
            tool_use_block = next(
                block for block in response.content 
                if block.type == "tool_use"
            )
            
            # Call the tool
            tool_result = self.tools[tool_use_block.name](
                **tool_use_block.input
            )
            
            # Add tool use to history
            self.conversation_history.append({
                "role": "assistant",
                "content": response.content
            })
            
            # Add tool result to history
            self.conversation_history.append({
                "role": "user",
                "content": [{
                    "type": "tool_result",
                    "tool_use_id": tool_use_block.id,
                    "content": str(tool_result)
                }]
            })
            
            # Get final response
            final_response = self.client.messages.create(
                model="claude-sonnet-4-5",
                max_tokens=1024,
                tools=[calculator_tool],
                messages=self.conversation_history
            )
            
            assistant_message = final_response.content[0].text
        else:
            assistant_message = response.content[0].text
        
        # Add final response to history
        self.conversation_history.append({
            "role": "assistant",
            "content": assistant_message
        })
        
        return assistant_message

Now watch how memory works with tools:

In[11]:
Code
agent = ConversationalAgentWithTools()

print(agent.chat("What's 1234 times 5678?"))
## Agent uses calculator tool
## "1,234 times 5,678 equals 7,006,652."

print(agent.chat("What if I double that?"))
## Agent remembers the previous result
## Agent calculates 7006652 * 2
## "Doubling that gives you 14,013,304."

print(agent.chat("And what's 10% of the original number?"))
## Agent remembers 7,006,652 was the original
## Agent calculates 7006652 * 0.1
## "10% of the original number (7,006,652) is 700,665.2."
Out[11]:
Console
1234 times 5678 equals **7,006,652**.
If you double that, you get **14,013,304**.
10% of the original number (7,006,652) is **700,665.2**.

The agent remembers both the conversation and the tool results. "Double that" refers to the previous calculation. "The original number" refers to the first result, not the doubled one.

This is powerful. The agent can build on its own work, reference previous calculations, and maintain context across multiple tool uses.

The Context Window Challenge

Here's a problem: conversation history grows with every exchange. After 50 turns, you're sending 100 messages (50 user, 50 assistant) with every new question. This creates two issues:

Cost: Most API providers charge per token. Sending the entire history every time gets expensive.

Context limits: Models have maximum context windows. Claude Sonnet 4.5 supports 200K tokens, but you'll eventually hit limits in very long conversations.

Let's see this problem in practice:

In[12]:
Code
## After a long conversation
print(f"Messages in history: {len(agent.conversation_history)}")
## Output: 156 messages

## Estimate token count (rough approximation)
total_chars = sum(len(str(msg)) for msg in agent.conversation_history)
estimated_tokens = total_chars // 4  # Rough estimate: 1 token ≈ 4 characters
print(f"Estimated tokens: {estimated_tokens}")
## Output: Estimated tokens: 12,450
Out[12]:
Console
Messages in history: 12
Estimated tokens: 341

For a long conversation, you might be sending thousands of tokens with each request. Most of those tokens are old messages that might not be relevant anymore.

Solution 1: Sliding Window Memory

The simplest solution: only keep the most recent N messages. This is called a "sliding window" because you keep a fixed-size window that slides forward as the conversation progresses.

In[13]:
Code
class ConversationalAgentWithWindow:
    """Agent with sliding window memory."""
    
    def __init__(self, max_messages: int = 20):
        self.client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
        self.conversation_history = []
        self.max_messages = max_messages
    
    def chat(self, user_message: str) -> str:
        """Chat with sliding window memory."""
        # Add user message
        self.conversation_history.append({
            "role": "user",
            "content": user_message
        })
        
        # Keep only recent messages
        if len(self.conversation_history) > self.max_messages:
            # Keep the most recent max_messages
            self.conversation_history = self.conversation_history[-self.max_messages:]
        
        # Get response with limited history
        response = self.client.messages.create(
            model="claude-sonnet-4-5",
            max_tokens=1024,
            messages=self.conversation_history
        )
        
        assistant_message = response.content[0].text
        
        # Add response to history
        self.conversation_history.append({
            "role": "assistant",
            "content": assistant_message
        })
        
        # Trim again if needed
        if len(self.conversation_history) > self.max_messages:
            self.conversation_history = self.conversation_history[-self.max_messages:]
        
        return assistant_message

With a sliding window of 20 messages, the agent remembers the last 10 exchanges (10 user messages + 10 assistant responses). Older messages are discarded.

This works well for:

  • Casual conversations where old context isn't needed
  • Cost-sensitive applications
  • Very long conversations that would exceed context limits

The tradeoff: the agent forgets older parts of the conversation. If you reference something from 15 exchanges ago, the agent won't remember it.

Solution 2: Conversation Summarization

A more sophisticated approach: periodically summarize old messages and replace them with the summary. This preserves important information while reducing token count.

In[14]:
Code
class ConversationalAgentWithSummary:
    """Agent with conversation summarization."""
    
    def __init__(self, summarize_after: int = 20):
        self.client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
        self.conversation_history = []
        self.summarize_after = summarize_after
    
    def _summarize_history(self, messages: list) -> str:
        """Create a summary of conversation history."""
        # Format the conversation
        conversation_text = "\n".join([
            f"{msg['role'].title()}: {msg['content']}"
            for msg in messages
        ])
        
        # Ask the model to summarize
        response = self.client.messages.create(
            model="claude-sonnet-4-5",
            max_tokens=500,
            messages=[{
                "role": "user",
                "content": f"""Summarize this conversation, preserving key facts, 
                decisions, and context that might be referenced later:
                
                {conversation_text}
                
                Provide a concise summary in 2-3 paragraphs."""
            }]
        )
        
        return response.content[0].text
    
    def chat(self, user_message: str) -> str:
        """Chat with automatic summarization."""
        # Add user message
        self.conversation_history.append({
            "role": "user",
            "content": user_message
        })
        
        # Check if we should summarize
        if len(self.conversation_history) > self.summarize_after:
            # Take first half of messages to summarize
            to_summarize = self.conversation_history[:self.summarize_after // 2]
            keep_recent = self.conversation_history[self.summarize_after // 2:]
            
            # Create summary
            summary = self._summarize_history(to_summarize)
            
            # Replace old messages with summary
            self.conversation_history = [
                {
                    "role": "user",
                    "content": f"[Previous conversation summary: {summary}]"
                }
            ] + keep_recent
        
        # Get response
        response = self.client.messages.create(
            model="claude-sonnet-4-5",
            max_tokens=1024,
            messages=self.conversation_history
        )
        
        assistant_message = response.content[0].text
        
        # Add response
        self.conversation_history.append({
            "role": "assistant",
            "content": assistant_message
        })
        
        return assistant_message

This approach:

  • Keeps recent messages in full detail
  • Summarizes older messages to preserve important context
  • Reduces token count while maintaining continuity

The tradeoff: summarization costs an extra API call, and some details might be lost in the summary.

Choosing a Memory Strategy

Which approach should you use? It depends on your use case:

Full history (no limits):

  • Best for: Short conversations, when cost isn't a concern
  • Pros: Perfect memory, no information loss
  • Cons: Expensive for long conversations, can hit context limits

Sliding window:

  • Best for: Casual chat, when only recent context matters
  • Pros: Simple, predictable cost, never hits context limits
  • Cons: Forgets older information completely

Summarization:

  • Best for: Long conversations where older context matters
  • Pros: Preserves important information, manages token count
  • Cons: More complex, costs extra for summarization, might lose details

For our personal assistant, a hybrid approach often works best:

In[15]:
Code
class SmartConversationalAgent:
    """Agent with intelligent memory management."""
    
    def __init__(self,
                 window_size: int = 30,
                 summarize_threshold: int = 50):
        self.client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
        self.conversation_history = []
        self.window_size = window_size
        self.summarize_threshold = summarize_threshold
        self.summary = None
    
    def chat(self, user_message: str) -> str:
        """Chat with smart memory management."""
        # Add message
        self.conversation_history.append({
            "role": "user",
            "content": user_message
        })
        
        # Manage memory
        if len(self.conversation_history) > self.summarize_threshold:
            # Summarize old messages
            old_messages = self.conversation_history[:-self.window_size]
            self.summary = self._summarize(old_messages)
            self.conversation_history = self.conversation_history[-self.window_size:]
        
        # Build context
        messages = []
        if self.summary:
            messages.append({
                "role": "user",
                "content": f"[Context from earlier: {self.summary}]"
            })
        messages.extend(self.conversation_history)
        
        # Get response
        response = self.client.messages.create(
            model="claude-sonnet-4-5",
            max_tokens=1024,
            messages=messages
        )
        
        assistant_message = response.content[0].text
        self.conversation_history.append({
            "role": "assistant",
            "content": assistant_message
        })
        
        return assistant_message

This combines both strategies: keep recent messages in full, summarize older ones.

Practical Considerations

As you build memory into your agent, keep these points in mind:

Start simple: Begin with full history. Only add complexity (windowing, summarization) when you actually need it.

Monitor costs: Track how many tokens you're sending. If costs are high, implement a sliding window.

Test edge cases: What happens when the user references something from 20 messages ago? Does your memory strategy handle it?

Consider the domain: A customer service bot might need full history. A casual chatbot might work fine with a small window.

Provide memory controls: Let users start fresh conversations when needed. Sometimes they want to change topics completely.

Key Takeaways

Let's review what we've learned about short-term conversation memory:

Memory is essential for conversation. Without it, agents can't handle follow-up questions or maintain context across turns.

Implementation is straightforward. Store messages in a list and send them with each new request. The model handles the rest.

Memory grows over time. Long conversations create large histories that cost more and can hit context limits.

Multiple strategies exist. Full history, sliding windows, and summarization each have their place.

Choose based on your needs. Consider conversation length, cost constraints, and how much context you need to preserve.

Your personal assistant now has short-term memory. It can maintain context across a conversation, handle follow-ups, and build on previous exchanges. This is a fundamental capability that makes agents feel natural and useful.

In the next chapter, we'll explore long-term memory: how to store information across sessions, remember user preferences, and retrieve relevant facts from a knowledge base. Combined with short-term memory, this will give your agent a complete memory system.

Glossary

Conversation History: The list of messages exchanged between the user and agent, stored in order and sent with each new request to provide context.

Stateless Interaction: A request-response pattern where each interaction is independent, with no memory of previous exchanges.

Sliding Window Memory: A memory strategy that keeps only the most recent N messages, discarding older ones to manage context size and cost.

Context Window: The maximum number of tokens a language model can process in a single request, including both the conversation history and the new message.

Conversation Summarization: A technique that condenses older messages into a brief summary to preserve important context while reducing token count.

Token: The basic unit of text that language models process, roughly equivalent to a word or word fragment. Models charge per token and have maximum token limits.

Quiz

Ready to test your understanding? Take this quick quiz to reinforce what you've learned about short-term conversation memory in AI agents.

Loading component...

Reference

BIBTEXAcademic
@misc{shorttermconversationmemorybuildingcontextawareaiagents, author = {Michael Brenndoerfer}, title = {Short-Term Conversation Memory: Building Context-Aware AI Agents}, year = {2025}, url = {https://mbrenndoerfer.com/writing/short-term-conversation-memory-ai-agents}, organization = {mbrenndoerfer.com}, note = {Accessed: 2025-12-25} }
APAAcademic
Michael Brenndoerfer (2025). Short-Term Conversation Memory: Building Context-Aware AI Agents. Retrieved from https://mbrenndoerfer.com/writing/short-term-conversation-memory-ai-agents
MLAAcademic
Michael Brenndoerfer. "Short-Term Conversation Memory: Building Context-Aware AI Agents." 2025. Web. 12/25/2025. <https://mbrenndoerfer.com/writing/short-term-conversation-memory-ai-agents>.
CHICAGOAcademic
Michael Brenndoerfer. "Short-Term Conversation Memory: Building Context-Aware AI Agents." Accessed 12/25/2025. https://mbrenndoerfer.com/writing/short-term-conversation-memory-ai-agents.
HARVARDAcademic
Michael Brenndoerfer (2025) 'Short-Term Conversation Memory: Building Context-Aware AI Agents'. Available at: https://mbrenndoerfer.com/writing/short-term-conversation-memory-ai-agents (Accessed: 12/25/2025).
SimpleBasic
Michael Brenndoerfer (2025). Short-Term Conversation Memory: Building Context-Aware AI Agents. https://mbrenndoerfer.com/writing/short-term-conversation-memory-ai-agents