Short-Term Conversation Memory: Building Context-Aware AI Agents

Michael Brenndoerfer

AI Agent Handbook Machine Learning Data, Analytics & AI

Learn how to give AI agents the ability to remember recent conversations, handle follow-up questions, and manage conversation history across multiple interactions.

Part of AI Agent Handbook

This article is part of the free-to-read AI Agent Handbook

View full handbook

Short-Term Conversation Memory

You've built an agent that can reason, use tools, and solve complex problems. But there's a fundamental limitation: it forgets everything after each interaction. Ask it a question, get an answer, then ask a follow-up, and it has no idea what you're talking about.

Imagine calling a help desk where the representative forgets your conversation every 30 seconds. You'd have to re-explain your problem constantly. Frustrating, right? That's exactly what happens with an agent that lacks memory.

In this chapter, we'll give your personal assistant the ability to remember recent conversations. You'll learn how to maintain context across multiple interactions, handle follow-up questions, and manage conversation history as it grows. By the end, your agent will feel less like a stateless question-answering machine and more like an assistant that actually remembers what you've been discussing.

The Problem: Stateless Interactions

Let's see what happens without memory. Here's a conversation with our agent:

1User: What's the capital of France?
2Agent: The capital of France is Paris.
3
4User: What's the population?
5Agent: I need more context. What location are you asking about?

1User: What's the capital of France?
2Agent: The capital of France is Paris.
3
4User: What's the population?
5Agent: I need more context. What location are you asking about?

The agent answered the first question perfectly. But when you asked a follow-up ("What's the population?"), it had no memory of discussing France. Each interaction is isolated, like talking to someone with amnesia.

This breaks down quickly in real conversations. People naturally ask follow-up questions, refer to previous topics, and build on earlier context. Without memory, your agent can't handle these basic conversational patterns.

Example (Claude Sonnet 4.5)

Let's see this in code:

1import anthropic
2
3## Using Claude Sonnet 4.5 for its excellent conversational capabilities
4client = anthropic.Anthropic(api_key="YOUR_ANTHROPIC_API_KEY")
5
6def ask_without_memory(question: str) -> str:
7    """Ask a question without any conversation history."""
8    response = client.messages.create(
9        model="claude-sonnet-4.5",
10        max_tokens=1024,
11        messages=[
12            {"role": "user", "content": question}
13        ]
14    )
15    return response.content[0].text
16
17## First question
18print("Q1:", ask_without_memory("What's the capital of France?"))
19## Output: "The capital of France is Paris."
20
21## Follow-up question
22print("Q2:", ask_without_memory("What's the population?"))
23## Output: "I'd be happy to help with population information, 
24## but I need to know which location you're asking about."

1import anthropic
2
3## Using Claude Sonnet 4.5 for its excellent conversational capabilities
4client = anthropic.Anthropic(api_key="YOUR_ANTHROPIC_API_KEY")
5
6def ask_without_memory(question: str) -> str:
7    """Ask a question without any conversation history."""
8    response = client.messages.create(
9        model="claude-sonnet-4.5",
10        max_tokens=1024,
11        messages=[
12            {"role": "user", "content": question}
13        ]
14    )
15    return response.content[0].text
16
17## First question
18print("Q1:", ask_without_memory("What's the capital of France?"))
19## Output: "The capital of France is Paris."
20
21## Follow-up question
22print("Q2:", ask_without_memory("What's the population?"))
23## Output: "I'd be happy to help with population information, 
24## but I need to know which location you're asking about."

Each call to ask_without_memory creates a fresh conversation. The agent has no context from previous questions. This is the default behavior when you don't explicitly manage conversation history.

The Solution: Conversation History

The fix is straightforward: keep track of the conversation and send it with each new message. Instead of just sending the latest question, you send the entire dialogue history.

Here's what that looks like:

1def ask_with_memory(conversation_history: list, question: str) -> tuple:
2    """
3    Ask a question with full conversation context.
4    
5    Args:
6        conversation_history: List of previous messages
7        question: The new question to ask
8    
9    Returns:
10        Tuple of (answer, updated_history)
11    """
12    # Add the new question to history
13    conversation_history.append({
14        "role": "user",
15        "content": question
16    })
17    
18    # Send entire conversation to the model
19    response = client.messages.create(
20        model="claude-sonnet-4.5",
21        max_tokens=1024,
22        messages=conversation_history
23    )
24    
25    # Extract the answer
26    answer = response.content[0].text
27    
28    # Add the answer to history
29    conversation_history.append({
30        "role": "assistant",
31        "content": answer
32    })
33    
34    return answer, conversation_history

1def ask_with_memory(conversation_history: list, question: str) -> tuple:
2    """
3    Ask a question with full conversation context.
4    
5    Args:
6        conversation_history: List of previous messages
7        question: The new question to ask
8    
9    Returns:
10        Tuple of (answer, updated_history)
11    """
12    # Add the new question to history
13    conversation_history.append({
14        "role": "user",
15        "content": question
16    })
17    
18    # Send entire conversation to the model
19    response = client.messages.create(
20        model="claude-sonnet-4.5",
21        max_tokens=1024,
22        messages=conversation_history
23    )
24    
25    # Extract the answer
26    answer = response.content[0].text
27    
28    # Add the answer to history
29    conversation_history.append({
30        "role": "assistant",
31        "content": answer
32    })
33    
34    return answer, conversation_history

Now let's try the same conversation with memory:

1## Start with empty history
2history = []
3
4## First question
5answer1, history = ask_with_memory(history, "What's the capital of France?")
6print("Q1:", answer1)
7## Output: "The capital of France is Paris."
8
9## Follow-up question - now with context!
10answer2, history = ask_with_memory(history, "What's the population?")
11print("Q2:", answer2)
12## Output: "Paris has a population of approximately 2.1 million people 
13## within the city proper, and about 12 million in the greater 
14## metropolitan area."

1## Start with empty history
2history = []
3
4## First question
5answer1, history = ask_with_memory(history, "What's the capital of France?")
6print("Q1:", answer1)
7## Output: "The capital of France is Paris."
8
9## Follow-up question - now with context!
10answer2, history = ask_with_memory(history, "What's the population?")
11print("Q2:", answer2)
12## Output: "Paris has a population of approximately 2.1 million people 
13## within the city proper, and about 12 million in the greater 
14## metropolitan area."

The agent understood "the population" refers to Paris because it remembered the previous exchange. This is the foundation of conversational AI: maintaining context across turns.

How Conversation History Works

Let's look at what's actually happening behind the scenes. After the two-question exchange above, our history list contains:

1[
2    {"role": "user", "content": "What's the capital of France?"},
3    {"role": "assistant", "content": "The capital of France is Paris."},
4    {"role": "user", "content": "What's the population?"},
5    {"role": "assistant", "content": "Paris has a population of approximately..."}
6]

1[
2    {"role": "user", "content": "What's the capital of France?"},
3    {"role": "assistant", "content": "The capital of France is Paris."},
4    {"role": "user", "content": "What's the population?"},
5    {"role": "assistant", "content": "Paris has a population of approximately..."}
6]

Each message is stored with its role (either "user" or "assistant") and content. When you ask a new question, the entire list goes to the model. The model sees the full conversation thread and can reference earlier messages.

Think of it like showing someone a chat transcript. They can read the whole conversation and understand the context of the latest message. That's exactly what we're doing with the language model.

Building a Conversational Agent

Let's create a more complete conversational agent that manages memory automatically:

1class ConversationalAgent:
2    """
3    A personal assistant that remembers the conversation.
4    """
5    
6    def __init__(self, api_key: str):
7        self.client = anthropic.Anthropic(api_key=api_key)
8        self.conversation_history = []
9        
10    def chat(self, user_message: str) -> str:
11        """
12        Send a message and get a response, maintaining conversation history.
13        
14        Args:
15            user_message: What the user wants to say
16        
17        Returns:
18            The agent's response
19        """
20        # Add user message to history
21        self.conversation_history.append({
22            "role": "user",
23            "content": user_message
24        })
25        
26        # Get response with full context
27        response = self.client.messages.create(
28            model="claude-sonnet-4.5",
29            max_tokens=1024,
30            messages=self.conversation_history
31        )
32        
33        # Extract and store the response
34        assistant_message = response.content[0].text
35        self.conversation_history.append({
36            "role": "assistant",
37            "content": assistant_message
38        })
39        
40        return assistant_message
41    
42    def clear_history(self):
43        """Start a fresh conversation."""
44        self.conversation_history = []
45    
46    def get_history(self) -> list:
47        """Get the current conversation history."""
48        return self.conversation_history.copy()

1class ConversationalAgent:
2    """
3    A personal assistant that remembers the conversation.
4    """
5    
6    def __init__(self, api_key: str):
7        self.client = anthropic.Anthropic(api_key=api_key)
8        self.conversation_history = []
9        
10    def chat(self, user_message: str) -> str:
11        """
12        Send a message and get a response, maintaining conversation history.
13        
14        Args:
15            user_message: What the user wants to say
16        
17        Returns:
18            The agent's response
19        """
20        # Add user message to history
21        self.conversation_history.append({
22            "role": "user",
23            "content": user_message
24        })
25        
26        # Get response with full context
27        response = self.client.messages.create(
28            model="claude-sonnet-4.5",
29            max_tokens=1024,
30            messages=self.conversation_history
31        )
32        
33        # Extract and store the response
34        assistant_message = response.content[0].text
35        self.conversation_history.append({
36            "role": "assistant",
37            "content": assistant_message
38        })
39        
40        return assistant_message
41    
42    def clear_history(self):
43        """Start a fresh conversation."""
44        self.conversation_history = []
45    
46    def get_history(self) -> list:
47        """Get the current conversation history."""
48        return self.conversation_history.copy()

Now we can have natural, multi-turn conversations:

1## Create the agent
2agent = ConversationalAgent(api_key="YOUR_ANTHROPIC_API_KEY")
3
4## Have a conversation
5print(agent.chat("I'm planning a trip to Japan."))
6## "That sounds exciting! Japan is a wonderful destination..."
7
8print(agent.chat("What's the best time of year to visit?"))
9## "For Japan, the best times to visit are typically spring (March-May) 
10## for cherry blossoms, or autumn (September-November) for fall foliage..."
11
12print(agent.chat("How long should I plan to stay?"))
13## "For a first trip to Japan, I'd recommend at least 10-14 days..."
14
15print(agent.chat("What cities should I include?"))
16## "Based on your trip to Japan, I'd suggest including Tokyo, Kyoto, 
17## and possibly Osaka or Hiroshima..."

1## Create the agent
2agent = ConversationalAgent(api_key="YOUR_ANTHROPIC_API_KEY")
3
4## Have a conversation
5print(agent.chat("I'm planning a trip to Japan."))
6## "That sounds exciting! Japan is a wonderful destination..."
7
8print(agent.chat("What's the best time of year to visit?"))
9## "For Japan, the best times to visit are typically spring (March-May) 
10## for cherry blossoms, or autumn (September-November) for fall foliage..."
11
12print(agent.chat("How long should I plan to stay?"))
13## "For a first trip to Japan, I'd recommend at least 10-14 days..."
14
15print(agent.chat("What cities should I include?"))
16## "Based on your trip to Japan, I'd suggest including Tokyo, Kyoto, 
17## and possibly Osaka or Hiroshima..."

Notice how each response builds on the previous context. The agent remembers we're discussing a Japan trip and tailors its answers accordingly.

Handling Follow-Up Questions

One of the most powerful aspects of conversation memory is handling follow-ups. People rarely ask perfectly self-contained questions. They say "What about that?", "Can you explain more?", or "Why is that?"

Let's see this in action:

1agent = ConversationalAgent(api_key="YOUR_ANTHROPIC_API_KEY")
2
3## Initial question
4print(agent.chat("What are the main differences between Python and JavaScript?"))
5## Detailed comparison of Python and JavaScript...
6
7## Vague follow-up
8print(agent.chat("Which one is easier to learn?"))
9## "For beginners, Python is generally considered easier to learn..."
10
11## Another vague follow-up
12print(agent.chat("Why is that?"))
13## "Python is easier to learn because of its clean, readable syntax..."
14
15## Reference to earlier point
16print(agent.chat("You mentioned async programming. Can you explain that more?"))
17## "In JavaScript, async programming is central to how the language works..."

1agent = ConversationalAgent(api_key="YOUR_ANTHROPIC_API_KEY")
2
3## Initial question
4print(agent.chat("What are the main differences between Python and JavaScript?"))
5## Detailed comparison of Python and JavaScript...
6
7## Vague follow-up
8print(agent.chat("Which one is easier to learn?"))
9## "For beginners, Python is generally considered easier to learn..."
10
11## Another vague follow-up
12print(agent.chat("Why is that?"))
13## "Python is easier to learn because of its clean, readable syntax..."
14
15## Reference to earlier point
16print(agent.chat("You mentioned async programming. Can you explain that more?"))
17## "In JavaScript, async programming is central to how the language works..."

Each follow-up question would be impossible to answer without context. "Which one" refers to Python and JavaScript. "Why is that" refers to Python being easier. "You mentioned" explicitly references the earlier response.

The agent handles all of this naturally because it has the full conversation in memory.

Memory with Tool Use

Memory becomes even more important when your agent uses tools. The agent needs to remember:

What tools it has called
What results it received
How those results relate to the user's questions

Let's extend our conversational agent to support tools:

1def calculate(expression: str) -> dict:
2    """Simple calculator tool."""
3    try:
4        result = eval(expression)
5        return {"success": True, "result": result}
6    except Exception as e:
7        return {"success": False, "error": str(e)}
8
9calculator_tool = {
10    "name": "calculate",
11    "description": "Perform mathematical calculations",
12    "input_schema": {
13        "type": "object",
14        "properties": {
15            "expression": {
16                "type": "string",
17                "description": "Math expression to evaluate"
18            }
19        },
20        "required": ["expression"]
21    }
22}
23
24class ConversationalAgentWithTools:
25    """Agent with memory and tool use."""
26    
27    def __init__(self, api_key: str):
28        self.client = anthropic.Anthropic(api_key=api_key)
29        self.conversation_history = []
30        self.tools = {"calculate": calculate}
31    
32    def chat(self, user_message: str) -> str:
33        """Chat with tool use support."""
34        # Add user message
35        self.conversation_history.append({
36            "role": "user",
37            "content": user_message
38        })
39        
40        # Get response
41        response = self.client.messages.create(
42            model="claude-sonnet-4.5",
43            max_tokens=1024,
44            tools=[calculator_tool],
45            messages=self.conversation_history
46        )
47        
48        # Handle tool use if needed
49        if response.stop_reason == "tool_use":
50            # Extract tool call
51            tool_use_block = next(
52                block for block in response.content 
53                if block.type == "tool_use"
54            )
55            
56            # Call the tool
57            tool_result = self.tools[tool_use_block.name](
58                **tool_use_block.input
59            )
60            
61            # Add tool use to history
62            self.conversation_history.append({
63                "role": "assistant",
64                "content": response.content
65            })
66            
67            # Add tool result to history
68            self.conversation_history.append({
69                "role": "user",
70                "content": [{
71                    "type": "tool_result",
72                    "tool_use_id": tool_use_block.id,
73                    "content": str(tool_result)
74                }]
75            })
76            
77            # Get final response
78            final_response = self.client.messages.create(
79                model="claude-sonnet-4.5",
80                max_tokens=1024,
81                tools=[calculator_tool],
82                messages=self.conversation_history
83            )
84            
85            assistant_message = final_response.content[0].text
86        else:
87            assistant_message = response.content[0].text
88        
89        # Add final response to history
90        self.conversation_history.append({
91            "role": "assistant",
92            "content": assistant_message
93        })
94        
95        return assistant_message

1def calculate(expression: str) -> dict:
2    """Simple calculator tool."""
3    try:
4        result = eval(expression)
5        return {"success": True, "result": result}
6    except Exception as e:
7        return {"success": False, "error": str(e)}
8
9calculator_tool = {
10    "name": "calculate",
11    "description": "Perform mathematical calculations",
12    "input_schema": {
13        "type": "object",
14        "properties": {
15            "expression": {
16                "type": "string",
17                "description": "Math expression to evaluate"
18            }
19        },
20        "required": ["expression"]
21    }
22}
23
24class ConversationalAgentWithTools:
25    """Agent with memory and tool use."""
26    
27    def __init__(self, api_key: str):
28        self.client = anthropic.Anthropic(api_key=api_key)
29        self.conversation_history = []
30        self.tools = {"calculate": calculate}
31    
32    def chat(self, user_message: str) -> str:
33        """Chat with tool use support."""
34        # Add user message
35        self.conversation_history.append({
36            "role": "user",
37            "content": user_message
38        })
39        
40        # Get response
41        response = self.client.messages.create(
42            model="claude-sonnet-4.5",
43            max_tokens=1024,
44            tools=[calculator_tool],
45            messages=self.conversation_history
46        )
47        
48        # Handle tool use if needed
49        if response.stop_reason == "tool_use":
50            # Extract tool call
51            tool_use_block = next(
52                block for block in response.content 
53                if block.type == "tool_use"
54            )
55            
56            # Call the tool
57            tool_result = self.tools[tool_use_block.name](
58                **tool_use_block.input
59            )
60            
61            # Add tool use to history
62            self.conversation_history.append({
63                "role": "assistant",
64                "content": response.content
65            })
66            
67            # Add tool result to history
68            self.conversation_history.append({
69                "role": "user",
70                "content": [{
71                    "type": "tool_result",
72                    "tool_use_id": tool_use_block.id,
73                    "content": str(tool_result)
74                }]
75            })
76            
77            # Get final response
78            final_response = self.client.messages.create(
79                model="claude-sonnet-4.5",
80                max_tokens=1024,
81                tools=[calculator_tool],
82                messages=self.conversation_history
83            )
84            
85            assistant_message = final_response.content[0].text
86        else:
87            assistant_message = response.content[0].text
88        
89        # Add final response to history
90        self.conversation_history.append({
91            "role": "assistant",
92            "content": assistant_message
93        })
94        
95        return assistant_message

Now watch how memory works with tools:

1agent = ConversationalAgentWithTools(api_key="YOUR_ANTHROPIC_API_KEY")
2
3print(agent.chat("What's 1234 times 5678?"))
4## Agent uses calculator tool
5## "1,234 times 5,678 equals 7,006,652."
6
7print(agent.chat("What if I double that?"))
8## Agent remembers the previous result
9## Agent calculates 7006652 * 2
10## "Doubling that gives you 14,013,304."
11
12print(agent.chat("And what's 10% of the original number?"))
13## Agent remembers 7,006,652 was the original
14## Agent calculates 7006652 * 0.1
15## "10% of the original number (7,006,652) is 700,665.2."

1agent = ConversationalAgentWithTools(api_key="YOUR_ANTHROPIC_API_KEY")
2
3print(agent.chat("What's 1234 times 5678?"))
4## Agent uses calculator tool
5## "1,234 times 5,678 equals 7,006,652."
6
7print(agent.chat("What if I double that?"))
8## Agent remembers the previous result
9## Agent calculates 7006652 * 2
10## "Doubling that gives you 14,013,304."
11
12print(agent.chat("And what's 10% of the original number?"))
13## Agent remembers 7,006,652 was the original
14## Agent calculates 7006652 * 0.1
15## "10% of the original number (7,006,652) is 700,665.2."

The agent remembers both the conversation and the tool results. "Double that" refers to the previous calculation. "The original number" refers to the first result, not the doubled one.

This is powerful. The agent can build on its own work, reference previous calculations, and maintain context across multiple tool uses.

The Context Window Challenge

Here's a problem: conversation history grows with every exchange. After 50 turns, you're sending 100 messages (50 user, 50 assistant) with every new question. This creates two issues:

Cost: Most API providers charge per token. Sending the entire history every time gets expensive.

Context limits: Models have maximum context windows. Claude Sonnet 4.5 supports 200K tokens, but you'll eventually hit limits in very long conversations.

Let's see this problem in practice:

1## After a long conversation
2print(f"Messages in history: {len(agent.conversation_history)}")
3## Output: 156 messages
4
5## Estimate token count (rough approximation)
6total_chars = sum(len(str(msg)) for msg in agent.conversation_history)
7estimated_tokens = total_chars // 4  # Rough estimate: 1 token ≈ 4 characters
8print(f"Estimated tokens: {estimated_tokens}")
9## Output: Estimated tokens: 12,450

1## After a long conversation
2print(f"Messages in history: {len(agent.conversation_history)}")
3## Output: 156 messages
4
5## Estimate token count (rough approximation)
6total_chars = sum(len(str(msg)) for msg in agent.conversation_history)
7estimated_tokens = total_chars // 4  # Rough estimate: 1 token ≈ 4 characters
8print(f"Estimated tokens: {estimated_tokens}")
9## Output: Estimated tokens: 12,450

For a long conversation, you might be sending thousands of tokens with each request. Most of those tokens are old messages that might not be relevant anymore.

Solution 1: Sliding Window Memory

The simplest solution: only keep the most recent N messages. This is called a "sliding window" because you keep a fixed-size window that slides forward as the conversation progresses.

1class ConversationalAgentWithWindow:
2    """Agent with sliding window memory."""
3    
4    def __init__(self, api_key: str, max_messages: int = 20):
5        self.client = anthropic.Anthropic(api_key=api_key)
6        self.conversation_history = []
7        self.max_messages = max_messages
8    
9    def chat(self, user_message: str) -> str:
10        """Chat with sliding window memory."""
11        # Add user message
12        self.conversation_history.append({
13            "role": "user",
14            "content": user_message
15        })
16        
17        # Keep only recent messages
18        if len(self.conversation_history) > self.max_messages:
19            # Keep the most recent max_messages
20            self.conversation_history = self.conversation_history[-self.max_messages:]
21        
22        # Get response with limited history
23        response = self.client.messages.create(
24            model="claude-sonnet-4.5",
25            max_tokens=1024,
26            messages=self.conversation_history
27        )
28        
29        assistant_message = response.content[0].text
30        
31        # Add response to history
32        self.conversation_history.append({
33            "role": "assistant",
34            "content": assistant_message
35        })
36        
37        # Trim again if needed
38        if len(self.conversation_history) > self.max_messages:
39            self.conversation_history = self.conversation_history[-self.max_messages:]
40        
41        return assistant_message

1class ConversationalAgentWithWindow:
2    """Agent with sliding window memory."""
3    
4    def __init__(self, api_key: str, max_messages: int = 20):
5        self.client = anthropic.Anthropic(api_key=api_key)
6        self.conversation_history = []
7        self.max_messages = max_messages
8    
9    def chat(self, user_message: str) -> str:
10        """Chat with sliding window memory."""
11        # Add user message
12        self.conversation_history.append({
13            "role": "user",
14            "content": user_message
15        })
16        
17        # Keep only recent messages
18        if len(self.conversation_history) > self.max_messages:
19            # Keep the most recent max_messages
20            self.conversation_history = self.conversation_history[-self.max_messages:]
21        
22        # Get response with limited history
23        response = self.client.messages.create(
24            model="claude-sonnet-4.5",
25            max_tokens=1024,
26            messages=self.conversation_history
27        )
28        
29        assistant_message = response.content[0].text
30        
31        # Add response to history
32        self.conversation_history.append({
33            "role": "assistant",
34            "content": assistant_message
35        })
36        
37        # Trim again if needed
38        if len(self.conversation_history) > self.max_messages:
39            self.conversation_history = self.conversation_history[-self.max_messages:]
40        
41        return assistant_message

With a sliding window of 20 messages, the agent remembers the last 10 exchanges (10 user messages + 10 assistant responses). Older messages are discarded.

This works well for:

Casual conversations where old context isn't needed
Cost-sensitive applications
Very long conversations that would exceed context limits

The tradeoff: the agent forgets older parts of the conversation. If you reference something from 15 exchanges ago, the agent won't remember it.

Solution 2: Conversation Summarization

A more sophisticated approach: periodically summarize old messages and replace them with the summary. This preserves important information while reducing token count.

1class ConversationalAgentWithSummary:
2    """Agent with conversation summarization."""
3    
4    def __init__(self, api_key: str, summarize_after: int = 20):
5        self.client = anthropic.Anthropic(api_key=api_key)
6        self.conversation_history = []
7        self.summarize_after = summarize_after
8    
9    def _summarize_history(self, messages: list) -> str:
10        """Create a summary of conversation history."""
11        # Format the conversation
12        conversation_text = "\n".join([
13            f"{msg['role'].title()}: {msg['content']}"
14            for msg in messages
15        ])
16        
17        # Ask the model to summarize
18        response = self.client.messages.create(
19            model="claude-sonnet-4.5",
20            max_tokens=500,
21            messages=[{
22                "role": "user",
23                "content": f"""Summarize this conversation, preserving key facts, 
24                decisions, and context that might be referenced later:
25                
26                {conversation_text}
27                
28                Provide a concise summary in 2-3 paragraphs."""
29            }]
30        )
31        
32        return response.content[0].text
33    
34    def chat(self, user_message: str) -> str:
35        """Chat with automatic summarization."""
36        # Add user message
37        self.conversation_history.append({
38            "role": "user",
39            "content": user_message
40        })
41        
42        # Check if we should summarize
43        if len(self.conversation_history) > self.summarize_after:
44            # Take first half of messages to summarize
45            to_summarize = self.conversation_history[:self.summarize_after // 2]
46            keep_recent = self.conversation_history[self.summarize_after // 2:]
47            
48            # Create summary
49            summary = self._summarize_history(to_summarize)
50            
51            # Replace old messages with summary
52            self.conversation_history = [
53                {
54                    "role": "user",
55                    "content": f"[Previous conversation summary: {summary}]"
56                }
57            ] + keep_recent
58        
59        # Get response
60        response = self.client.messages.create(
61            model="claude-sonnet-4.5",
62            max_tokens=1024,
63            messages=self.conversation_history
64        )
65        
66        assistant_message = response.content[0].text
67        
68        # Add response
69        self.conversation_history.append({
70            "role": "assistant",
71            "content": assistant_message
72        })
73        
74        return assistant_message

1class ConversationalAgentWithSummary:
2    """Agent with conversation summarization."""
3    
4    def __init__(self, api_key: str, summarize_after: int = 20):
5        self.client = anthropic.Anthropic(api_key=api_key)
6        self.conversation_history = []
7        self.summarize_after = summarize_after
8    
9    def _summarize_history(self, messages: list) -> str:
10        """Create a summary of conversation history."""
11        # Format the conversation
12        conversation_text = "\n".join([
13            f"{msg['role'].title()}: {msg['content']}"
14            for msg in messages
15        ])
16        
17        # Ask the model to summarize
18        response = self.client.messages.create(
19            model="claude-sonnet-4.5",
20            max_tokens=500,
21            messages=[{
22                "role": "user",
23                "content": f"""Summarize this conversation, preserving key facts, 
24                decisions, and context that might be referenced later:
25                
26                {conversation_text}
27                
28                Provide a concise summary in 2-3 paragraphs."""
29            }]
30        )
31        
32        return response.content[0].text
33    
34    def chat(self, user_message: str) -> str:
35        """Chat with automatic summarization."""
36        # Add user message
37        self.conversation_history.append({
38            "role": "user",
39            "content": user_message
40        })
41        
42        # Check if we should summarize
43        if len(self.conversation_history) > self.summarize_after:
44            # Take first half of messages to summarize
45            to_summarize = self.conversation_history[:self.summarize_after // 2]
46            keep_recent = self.conversation_history[self.summarize_after // 2:]
47            
48            # Create summary
49            summary = self._summarize_history(to_summarize)
50            
51            # Replace old messages with summary
52            self.conversation_history = [
53                {
54                    "role": "user",
55                    "content": f"[Previous conversation summary: {summary}]"
56                }
57            ] + keep_recent
58        
59        # Get response
60        response = self.client.messages.create(
61            model="claude-sonnet-4.5",
62            max_tokens=1024,
63            messages=self.conversation_history
64        )
65        
66        assistant_message = response.content[0].text
67        
68        # Add response
69        self.conversation_history.append({
70            "role": "assistant",
71            "content": assistant_message
72        })
73        
74        return assistant_message

This approach:

Keeps recent messages in full detail
Summarizes older messages to preserve important context
Reduces token count while maintaining continuity

The tradeoff: summarization costs an extra API call, and some details might be lost in the summary.

Choosing a Memory Strategy

Which approach should you use? It depends on your use case:

Full history (no limits):

Best for: Short conversations, when cost isn't a concern
Pros: Perfect memory, no information loss
Cons: Expensive for long conversations, can hit context limits

Sliding window:

Best for: Casual chat, when only recent context matters
Pros: Simple, predictable cost, never hits context limits
Cons: Forgets older information completely

Summarization:

Best for: Long conversations where older context matters
Pros: Preserves important information, manages token count
Cons: More complex, costs extra for summarization, might lose details

For our personal assistant, a hybrid approach often works best:

1class SmartConversationalAgent:
2    """Agent with intelligent memory management."""
3    
4    def __init__(self, api_key: str, 
5                 window_size: int = 30,
6                 summarize_threshold: int = 50):
7        self.client = anthropic.Anthropic(api_key=api_key)
8        self.conversation_history = []
9        self.window_size = window_size
10        self.summarize_threshold = summarize_threshold
11        self.summary = None
12    
13    def chat(self, user_message: str) -> str:
14        """Chat with smart memory management."""
15        # Add message
16        self.conversation_history.append({
17            "role": "user",
18            "content": user_message
19        })
20        
21        # Manage memory
22        if len(self.conversation_history) > self.summarize_threshold:
23            # Summarize old messages
24            old_messages = self.conversation_history[:-self.window_size]
25            self.summary = self._summarize(old_messages)
26            self.conversation_history = self.conversation_history[-self.window_size:]
27        
28        # Build context
29        messages = []
30        if self.summary:
31            messages.append({
32                "role": "user",
33                "content": f"[Context from earlier: {self.summary}]"
34            })
35        messages.extend(self.conversation_history)
36        
37        # Get response
38        response = self.client.messages.create(
39            model="claude-sonnet-4.5",
40            max_tokens=1024,
41            messages=messages
42        )
43        
44        assistant_message = response.content[0].text
45        self.conversation_history.append({
46            "role": "assistant",
47            "content": assistant_message
48        })
49        
50        return assistant_message

1class SmartConversationalAgent:
2    """Agent with intelligent memory management."""
3    
4    def __init__(self, api_key: str, 
5                 window_size: int = 30,
6                 summarize_threshold: int = 50):
7        self.client = anthropic.Anthropic(api_key=api_key)
8        self.conversation_history = []
9        self.window_size = window_size
10        self.summarize_threshold = summarize_threshold
11        self.summary = None
12    
13    def chat(self, user_message: str) -> str:
14        """Chat with smart memory management."""
15        # Add message
16        self.conversation_history.append({
17            "role": "user",
18            "content": user_message
19        })
20        
21        # Manage memory
22        if len(self.conversation_history) > self.summarize_threshold:
23            # Summarize old messages
24            old_messages = self.conversation_history[:-self.window_size]
25            self.summary = self._summarize(old_messages)
26            self.conversation_history = self.conversation_history[-self.window_size:]
27        
28        # Build context
29        messages = []
30        if self.summary:
31            messages.append({
32                "role": "user",
33                "content": f"[Context from earlier: {self.summary}]"
34            })
35        messages.extend(self.conversation_history)
36        
37        # Get response
38        response = self.client.messages.create(
39            model="claude-sonnet-4.5",
40            max_tokens=1024,
41            messages=messages
42        )
43        
44        assistant_message = response.content[0].text
45        self.conversation_history.append({
46            "role": "assistant",
47            "content": assistant_message
48        })
49        
50        return assistant_message

This combines both strategies: keep recent messages in full, summarize older ones.

Practical Considerations

As you build memory into your agent, keep these points in mind:

Start simple: Begin with full history. Only add complexity (windowing, summarization) when you actually need it.

Monitor costs: Track how many tokens you're sending. If costs are high, implement a sliding window.

Test edge cases: What happens when the user references something from 20 messages ago? Does your memory strategy handle it?

Consider the domain: A customer service bot might need full history. A casual chatbot might work fine with a small window.

Provide memory controls: Let users start fresh conversations when needed. Sometimes they want to change topics completely.

Key Takeaways

Let's review what we've learned about short-term conversation memory:

Memory is essential for conversation. Without it, agents can't handle follow-up questions or maintain context across turns.

Implementation is straightforward. Store messages in a list and send them with each new request. The model handles the rest.

Memory grows over time. Long conversations create large histories that cost more and can hit context limits.

Multiple strategies exist. Full history, sliding windows, and summarization each have their place.

Choose based on your needs. Consider conversation length, cost constraints, and how much context you need to preserve.

Your personal assistant now has short-term memory. It can maintain context across a conversation, handle follow-ups, and build on previous exchanges. This is a fundamental capability that makes agents feel natural and useful.

In the next chapter, we'll explore long-term memory: how to store information across sessions, remember user preferences, and retrieve relevant facts from a knowledge base. Combined with short-term memory, this will give your agent a complete memory system.

Glossary

Conversation History: The list of messages exchanged between the user and agent, stored in order and sent with each new request to provide context.

Stateless Interaction: A request-response pattern where each interaction is independent, with no memory of previous exchanges.

Sliding Window Memory: A memory strategy that keeps only the most recent N messages, discarding older ones to manage context size and cost.

Context Window: The maximum number of tokens a language model can process in a single request, including both the conversation history and the new message.

Conversation Summarization: A technique that condenses older messages into a brief summary to preserve important context while reducing token count.

Token: The basic unit of text that language models process, roughly equivalent to a word or word fragment. Models charge per token and have maximum token limits.

Quiz

Ready to test your understanding? Take this quick quiz to reinforce what you've learned about short-term conversation memory in AI agents.

Loading component...

Back to AI Agent Handbook

Previous Chapter

Example: Adding a Calculator to Our Agent

Next Chapter

Long-Term Knowledge Storage and Retrieval

Reference

BIBTEXAcademic

@misc{shorttermconversationmemorybuildingcontextawareaiagents, author = {Michael Brenndoerfer}, title = {Short-Term Conversation Memory: Building Context-Aware AI Agents}, year = {2025}, url = {https://mbrenndoerfer.com/writing/short-term-conversation-memory-ai-agents}, organization = {mbrenndoerfer.com}, note = {Accessed: 2025-11-10} }

APAAcademic

Michael Brenndoerfer (2025). Short-Term Conversation Memory: Building Context-Aware AI Agents. Retrieved from https://mbrenndoerfer.com/writing/short-term-conversation-memory-ai-agents

MLAAcademic

Michael Brenndoerfer. "Short-Term Conversation Memory: Building Context-Aware AI Agents." 2025. Web. 11/10/2025. <https://mbrenndoerfer.com/writing/short-term-conversation-memory-ai-agents>.

CHICAGOAcademic

Michael Brenndoerfer. "Short-Term Conversation Memory: Building Context-Aware AI Agents." Accessed 11/10/2025. https://mbrenndoerfer.com/writing/short-term-conversation-memory-ai-agents.

HARVARDAcademic

Michael Brenndoerfer (2025) 'Short-Term Conversation Memory: Building Context-Aware AI Agents'. Available at: https://mbrenndoerfer.com/writing/short-term-conversation-memory-ai-agents (Accessed: 11/10/2025).

SimpleBasic

Michael Brenndoerfer (2025). Short-Term Conversation Memory: Building Context-Aware AI Agents. https://mbrenndoerfer.com/writing/short-term-conversation-memory-ai-agents

Direct link:

https://mbrenndoerfer.com/writing/short-term-conversation-memory-ai-agents

Part of AI Agent Handbook

This article is part of the free-to-read AI Agent Handbook

View full handbook

About the author: Michael Brenndoerfer

All opinions expressed here are my own and do not reflect the views of my employer.

Michael currently works as an Associate Director of Data Science at EQT Partners in Singapore, where he drives AI and data initiatives across private capital investments.

With over a decade of experience spanning private equity, management consulting, and software engineering, he specializes in building and scaling analytics capabilities from the ground up. He has published research in leading AI conferences and holds expertise in machine learning, natural language processing, and value creation through data.

View Full Resume Publications

InteractiveShort-Term Conversation Memory: Building Context-Aware AI Agents