Learn how to give AI agents the ability to remember recent conversations, handle follow-up questions, and manage conversation history across multiple interactions.

This article is part of the free-to-read AI Agent Handbook
Short-Term Conversation Memory
You've built an agent that can reason, use tools, and solve complex problems. But there's a fundamental limitation: it forgets everything after each interaction. Ask it a question, get an answer, then ask a follow-up, and it has no idea what you're talking about.
Imagine calling a help desk where the representative forgets your conversation every 30 seconds. You'd have to re-explain your problem constantly. Frustrating, right? That's exactly what happens with an agent that lacks memory.
In this chapter, we'll give your personal assistant the ability to remember recent conversations. You'll learn how to maintain context across multiple interactions, handle follow-up questions, and manage conversation history as it grows. By the end, your agent will feel less like a stateless question-answering machine and more like an assistant that actually remembers what you've been discussing.
The Problem: Stateless Interactions
Let's see what happens without memory. Here's a conversation with our agent:
1User: What's the capital of France?
2Agent: The capital of France is Paris.
3
4User: What's the population?
5Agent: I need more context. What location are you asking about?1User: What's the capital of France?
2Agent: The capital of France is Paris.
3
4User: What's the population?
5Agent: I need more context. What location are you asking about?The agent answered the first question perfectly. But when you asked a follow-up ("What's the population?"), it had no memory of discussing France. Each interaction is isolated, like talking to someone with amnesia.
This breaks down quickly in real conversations. People naturally ask follow-up questions, refer to previous topics, and build on earlier context. Without memory, your agent can't handle these basic conversational patterns.
Example (Claude Sonnet 4.5)
Let's see this in code:
1import anthropic
2
3## Using Claude Sonnet 4.5 for its excellent conversational capabilities
4client = anthropic.Anthropic(api_key="YOUR_ANTHROPIC_API_KEY")
5
6def ask_without_memory(question: str) -> str:
7 """Ask a question without any conversation history."""
8 response = client.messages.create(
9 model="claude-sonnet-4.5",
10 max_tokens=1024,
11 messages=[
12 {"role": "user", "content": question}
13 ]
14 )
15 return response.content[0].text
16
17## First question
18print("Q1:", ask_without_memory("What's the capital of France?"))
19## Output: "The capital of France is Paris."
20
21## Follow-up question
22print("Q2:", ask_without_memory("What's the population?"))
23## Output: "I'd be happy to help with population information,
24## but I need to know which location you're asking about."1import anthropic
2
3## Using Claude Sonnet 4.5 for its excellent conversational capabilities
4client = anthropic.Anthropic(api_key="YOUR_ANTHROPIC_API_KEY")
5
6def ask_without_memory(question: str) -> str:
7 """Ask a question without any conversation history."""
8 response = client.messages.create(
9 model="claude-sonnet-4.5",
10 max_tokens=1024,
11 messages=[
12 {"role": "user", "content": question}
13 ]
14 )
15 return response.content[0].text
16
17## First question
18print("Q1:", ask_without_memory("What's the capital of France?"))
19## Output: "The capital of France is Paris."
20
21## Follow-up question
22print("Q2:", ask_without_memory("What's the population?"))
23## Output: "I'd be happy to help with population information,
24## but I need to know which location you're asking about."Each call to ask_without_memory creates a fresh conversation. The agent has no context from previous questions. This is the default behavior when you don't explicitly manage conversation history.
The Solution: Conversation History
The fix is straightforward: keep track of the conversation and send it with each new message. Instead of just sending the latest question, you send the entire dialogue history.
Here's what that looks like:
1def ask_with_memory(conversation_history: list, question: str) -> tuple:
2 """
3 Ask a question with full conversation context.
4
5 Args:
6 conversation_history: List of previous messages
7 question: The new question to ask
8
9 Returns:
10 Tuple of (answer, updated_history)
11 """
12 # Add the new question to history
13 conversation_history.append({
14 "role": "user",
15 "content": question
16 })
17
18 # Send entire conversation to the model
19 response = client.messages.create(
20 model="claude-sonnet-4.5",
21 max_tokens=1024,
22 messages=conversation_history
23 )
24
25 # Extract the answer
26 answer = response.content[0].text
27
28 # Add the answer to history
29 conversation_history.append({
30 "role": "assistant",
31 "content": answer
32 })
33
34 return answer, conversation_history1def ask_with_memory(conversation_history: list, question: str) -> tuple:
2 """
3 Ask a question with full conversation context.
4
5 Args:
6 conversation_history: List of previous messages
7 question: The new question to ask
8
9 Returns:
10 Tuple of (answer, updated_history)
11 """
12 # Add the new question to history
13 conversation_history.append({
14 "role": "user",
15 "content": question
16 })
17
18 # Send entire conversation to the model
19 response = client.messages.create(
20 model="claude-sonnet-4.5",
21 max_tokens=1024,
22 messages=conversation_history
23 )
24
25 # Extract the answer
26 answer = response.content[0].text
27
28 # Add the answer to history
29 conversation_history.append({
30 "role": "assistant",
31 "content": answer
32 })
33
34 return answer, conversation_historyNow let's try the same conversation with memory:
1## Start with empty history
2history = []
3
4## First question
5answer1, history = ask_with_memory(history, "What's the capital of France?")
6print("Q1:", answer1)
7## Output: "The capital of France is Paris."
8
9## Follow-up question - now with context!
10answer2, history = ask_with_memory(history, "What's the population?")
11print("Q2:", answer2)
12## Output: "Paris has a population of approximately 2.1 million people
13## within the city proper, and about 12 million in the greater
14## metropolitan area."1## Start with empty history
2history = []
3
4## First question
5answer1, history = ask_with_memory(history, "What's the capital of France?")
6print("Q1:", answer1)
7## Output: "The capital of France is Paris."
8
9## Follow-up question - now with context!
10answer2, history = ask_with_memory(history, "What's the population?")
11print("Q2:", answer2)
12## Output: "Paris has a population of approximately 2.1 million people
13## within the city proper, and about 12 million in the greater
14## metropolitan area."The agent understood "the population" refers to Paris because it remembered the previous exchange. This is the foundation of conversational AI: maintaining context across turns.
How Conversation History Works
Let's look at what's actually happening behind the scenes. After the two-question exchange above, our history list contains:
1[
2 {"role": "user", "content": "What's the capital of France?"},
3 {"role": "assistant", "content": "The capital of France is Paris."},
4 {"role": "user", "content": "What's the population?"},
5 {"role": "assistant", "content": "Paris has a population of approximately..."}
6]1[
2 {"role": "user", "content": "What's the capital of France?"},
3 {"role": "assistant", "content": "The capital of France is Paris."},
4 {"role": "user", "content": "What's the population?"},
5 {"role": "assistant", "content": "Paris has a population of approximately..."}
6]Each message is stored with its role (either "user" or "assistant") and content. When you ask a new question, the entire list goes to the model. The model sees the full conversation thread and can reference earlier messages.
Think of it like showing someone a chat transcript. They can read the whole conversation and understand the context of the latest message. That's exactly what we're doing with the language model.
Building a Conversational Agent
Let's create a more complete conversational agent that manages memory automatically:
1class ConversationalAgent:
2 """
3 A personal assistant that remembers the conversation.
4 """
5
6 def __init__(self, api_key: str):
7 self.client = anthropic.Anthropic(api_key=api_key)
8 self.conversation_history = []
9
10 def chat(self, user_message: str) -> str:
11 """
12 Send a message and get a response, maintaining conversation history.
13
14 Args:
15 user_message: What the user wants to say
16
17 Returns:
18 The agent's response
19 """
20 # Add user message to history
21 self.conversation_history.append({
22 "role": "user",
23 "content": user_message
24 })
25
26 # Get response with full context
27 response = self.client.messages.create(
28 model="claude-sonnet-4.5",
29 max_tokens=1024,
30 messages=self.conversation_history
31 )
32
33 # Extract and store the response
34 assistant_message = response.content[0].text
35 self.conversation_history.append({
36 "role": "assistant",
37 "content": assistant_message
38 })
39
40 return assistant_message
41
42 def clear_history(self):
43 """Start a fresh conversation."""
44 self.conversation_history = []
45
46 def get_history(self) -> list:
47 """Get the current conversation history."""
48 return self.conversation_history.copy()1class ConversationalAgent:
2 """
3 A personal assistant that remembers the conversation.
4 """
5
6 def __init__(self, api_key: str):
7 self.client = anthropic.Anthropic(api_key=api_key)
8 self.conversation_history = []
9
10 def chat(self, user_message: str) -> str:
11 """
12 Send a message and get a response, maintaining conversation history.
13
14 Args:
15 user_message: What the user wants to say
16
17 Returns:
18 The agent's response
19 """
20 # Add user message to history
21 self.conversation_history.append({
22 "role": "user",
23 "content": user_message
24 })
25
26 # Get response with full context
27 response = self.client.messages.create(
28 model="claude-sonnet-4.5",
29 max_tokens=1024,
30 messages=self.conversation_history
31 )
32
33 # Extract and store the response
34 assistant_message = response.content[0].text
35 self.conversation_history.append({
36 "role": "assistant",
37 "content": assistant_message
38 })
39
40 return assistant_message
41
42 def clear_history(self):
43 """Start a fresh conversation."""
44 self.conversation_history = []
45
46 def get_history(self) -> list:
47 """Get the current conversation history."""
48 return self.conversation_history.copy()Now we can have natural, multi-turn conversations:
1## Create the agent
2agent = ConversationalAgent(api_key="YOUR_ANTHROPIC_API_KEY")
3
4## Have a conversation
5print(agent.chat("I'm planning a trip to Japan."))
6## "That sounds exciting! Japan is a wonderful destination..."
7
8print(agent.chat("What's the best time of year to visit?"))
9## "For Japan, the best times to visit are typically spring (March-May)
10## for cherry blossoms, or autumn (September-November) for fall foliage..."
11
12print(agent.chat("How long should I plan to stay?"))
13## "For a first trip to Japan, I'd recommend at least 10-14 days..."
14
15print(agent.chat("What cities should I include?"))
16## "Based on your trip to Japan, I'd suggest including Tokyo, Kyoto,
17## and possibly Osaka or Hiroshima..."1## Create the agent
2agent = ConversationalAgent(api_key="YOUR_ANTHROPIC_API_KEY")
3
4## Have a conversation
5print(agent.chat("I'm planning a trip to Japan."))
6## "That sounds exciting! Japan is a wonderful destination..."
7
8print(agent.chat("What's the best time of year to visit?"))
9## "For Japan, the best times to visit are typically spring (March-May)
10## for cherry blossoms, or autumn (September-November) for fall foliage..."
11
12print(agent.chat("How long should I plan to stay?"))
13## "For a first trip to Japan, I'd recommend at least 10-14 days..."
14
15print(agent.chat("What cities should I include?"))
16## "Based on your trip to Japan, I'd suggest including Tokyo, Kyoto,
17## and possibly Osaka or Hiroshima..."Notice how each response builds on the previous context. The agent remembers we're discussing a Japan trip and tailors its answers accordingly.
Handling Follow-Up Questions
One of the most powerful aspects of conversation memory is handling follow-ups. People rarely ask perfectly self-contained questions. They say "What about that?", "Can you explain more?", or "Why is that?"
Let's see this in action:
1agent = ConversationalAgent(api_key="YOUR_ANTHROPIC_API_KEY")
2
3## Initial question
4print(agent.chat("What are the main differences between Python and JavaScript?"))
5## Detailed comparison of Python and JavaScript...
6
7## Vague follow-up
8print(agent.chat("Which one is easier to learn?"))
9## "For beginners, Python is generally considered easier to learn..."
10
11## Another vague follow-up
12print(agent.chat("Why is that?"))
13## "Python is easier to learn because of its clean, readable syntax..."
14
15## Reference to earlier point
16print(agent.chat("You mentioned async programming. Can you explain that more?"))
17## "In JavaScript, async programming is central to how the language works..."1agent = ConversationalAgent(api_key="YOUR_ANTHROPIC_API_KEY")
2
3## Initial question
4print(agent.chat("What are the main differences between Python and JavaScript?"))
5## Detailed comparison of Python and JavaScript...
6
7## Vague follow-up
8print(agent.chat("Which one is easier to learn?"))
9## "For beginners, Python is generally considered easier to learn..."
10
11## Another vague follow-up
12print(agent.chat("Why is that?"))
13## "Python is easier to learn because of its clean, readable syntax..."
14
15## Reference to earlier point
16print(agent.chat("You mentioned async programming. Can you explain that more?"))
17## "In JavaScript, async programming is central to how the language works..."Each follow-up question would be impossible to answer without context. "Which one" refers to Python and JavaScript. "Why is that" refers to Python being easier. "You mentioned" explicitly references the earlier response.
The agent handles all of this naturally because it has the full conversation in memory.
Memory with Tool Use
Memory becomes even more important when your agent uses tools. The agent needs to remember:
- What tools it has called
- What results it received
- How those results relate to the user's questions
Let's extend our conversational agent to support tools:
1def calculate(expression: str) -> dict:
2 """Simple calculator tool."""
3 try:
4 result = eval(expression)
5 return {"success": True, "result": result}
6 except Exception as e:
7 return {"success": False, "error": str(e)}
8
9calculator_tool = {
10 "name": "calculate",
11 "description": "Perform mathematical calculations",
12 "input_schema": {
13 "type": "object",
14 "properties": {
15 "expression": {
16 "type": "string",
17 "description": "Math expression to evaluate"
18 }
19 },
20 "required": ["expression"]
21 }
22}
23
24class ConversationalAgentWithTools:
25 """Agent with memory and tool use."""
26
27 def __init__(self, api_key: str):
28 self.client = anthropic.Anthropic(api_key=api_key)
29 self.conversation_history = []
30 self.tools = {"calculate": calculate}
31
32 def chat(self, user_message: str) -> str:
33 """Chat with tool use support."""
34 # Add user message
35 self.conversation_history.append({
36 "role": "user",
37 "content": user_message
38 })
39
40 # Get response
41 response = self.client.messages.create(
42 model="claude-sonnet-4.5",
43 max_tokens=1024,
44 tools=[calculator_tool],
45 messages=self.conversation_history
46 )
47
48 # Handle tool use if needed
49 if response.stop_reason == "tool_use":
50 # Extract tool call
51 tool_use_block = next(
52 block for block in response.content
53 if block.type == "tool_use"
54 )
55
56 # Call the tool
57 tool_result = self.tools[tool_use_block.name](
58 **tool_use_block.input
59 )
60
61 # Add tool use to history
62 self.conversation_history.append({
63 "role": "assistant",
64 "content": response.content
65 })
66
67 # Add tool result to history
68 self.conversation_history.append({
69 "role": "user",
70 "content": [{
71 "type": "tool_result",
72 "tool_use_id": tool_use_block.id,
73 "content": str(tool_result)
74 }]
75 })
76
77 # Get final response
78 final_response = self.client.messages.create(
79 model="claude-sonnet-4.5",
80 max_tokens=1024,
81 tools=[calculator_tool],
82 messages=self.conversation_history
83 )
84
85 assistant_message = final_response.content[0].text
86 else:
87 assistant_message = response.content[0].text
88
89 # Add final response to history
90 self.conversation_history.append({
91 "role": "assistant",
92 "content": assistant_message
93 })
94
95 return assistant_message1def calculate(expression: str) -> dict:
2 """Simple calculator tool."""
3 try:
4 result = eval(expression)
5 return {"success": True, "result": result}
6 except Exception as e:
7 return {"success": False, "error": str(e)}
8
9calculator_tool = {
10 "name": "calculate",
11 "description": "Perform mathematical calculations",
12 "input_schema": {
13 "type": "object",
14 "properties": {
15 "expression": {
16 "type": "string",
17 "description": "Math expression to evaluate"
18 }
19 },
20 "required": ["expression"]
21 }
22}
23
24class ConversationalAgentWithTools:
25 """Agent with memory and tool use."""
26
27 def __init__(self, api_key: str):
28 self.client = anthropic.Anthropic(api_key=api_key)
29 self.conversation_history = []
30 self.tools = {"calculate": calculate}
31
32 def chat(self, user_message: str) -> str:
33 """Chat with tool use support."""
34 # Add user message
35 self.conversation_history.append({
36 "role": "user",
37 "content": user_message
38 })
39
40 # Get response
41 response = self.client.messages.create(
42 model="claude-sonnet-4.5",
43 max_tokens=1024,
44 tools=[calculator_tool],
45 messages=self.conversation_history
46 )
47
48 # Handle tool use if needed
49 if response.stop_reason == "tool_use":
50 # Extract tool call
51 tool_use_block = next(
52 block for block in response.content
53 if block.type == "tool_use"
54 )
55
56 # Call the tool
57 tool_result = self.tools[tool_use_block.name](
58 **tool_use_block.input
59 )
60
61 # Add tool use to history
62 self.conversation_history.append({
63 "role": "assistant",
64 "content": response.content
65 })
66
67 # Add tool result to history
68 self.conversation_history.append({
69 "role": "user",
70 "content": [{
71 "type": "tool_result",
72 "tool_use_id": tool_use_block.id,
73 "content": str(tool_result)
74 }]
75 })
76
77 # Get final response
78 final_response = self.client.messages.create(
79 model="claude-sonnet-4.5",
80 max_tokens=1024,
81 tools=[calculator_tool],
82 messages=self.conversation_history
83 )
84
85 assistant_message = final_response.content[0].text
86 else:
87 assistant_message = response.content[0].text
88
89 # Add final response to history
90 self.conversation_history.append({
91 "role": "assistant",
92 "content": assistant_message
93 })
94
95 return assistant_messageNow watch how memory works with tools:
1agent = ConversationalAgentWithTools(api_key="YOUR_ANTHROPIC_API_KEY")
2
3print(agent.chat("What's 1234 times 5678?"))
4## Agent uses calculator tool
5## "1,234 times 5,678 equals 7,006,652."
6
7print(agent.chat("What if I double that?"))
8## Agent remembers the previous result
9## Agent calculates 7006652 * 2
10## "Doubling that gives you 14,013,304."
11
12print(agent.chat("And what's 10% of the original number?"))
13## Agent remembers 7,006,652 was the original
14## Agent calculates 7006652 * 0.1
15## "10% of the original number (7,006,652) is 700,665.2."1agent = ConversationalAgentWithTools(api_key="YOUR_ANTHROPIC_API_KEY")
2
3print(agent.chat("What's 1234 times 5678?"))
4## Agent uses calculator tool
5## "1,234 times 5,678 equals 7,006,652."
6
7print(agent.chat("What if I double that?"))
8## Agent remembers the previous result
9## Agent calculates 7006652 * 2
10## "Doubling that gives you 14,013,304."
11
12print(agent.chat("And what's 10% of the original number?"))
13## Agent remembers 7,006,652 was the original
14## Agent calculates 7006652 * 0.1
15## "10% of the original number (7,006,652) is 700,665.2."The agent remembers both the conversation and the tool results. "Double that" refers to the previous calculation. "The original number" refers to the first result, not the doubled one.
This is powerful. The agent can build on its own work, reference previous calculations, and maintain context across multiple tool uses.
The Context Window Challenge
Here's a problem: conversation history grows with every exchange. After 50 turns, you're sending 100 messages (50 user, 50 assistant) with every new question. This creates two issues:
Cost: Most API providers charge per token. Sending the entire history every time gets expensive.
Context limits: Models have maximum context windows. Claude Sonnet 4.5 supports 200K tokens, but you'll eventually hit limits in very long conversations.
Let's see this problem in practice:
1## After a long conversation
2print(f"Messages in history: {len(agent.conversation_history)}")
3## Output: 156 messages
4
5## Estimate token count (rough approximation)
6total_chars = sum(len(str(msg)) for msg in agent.conversation_history)
7estimated_tokens = total_chars // 4 # Rough estimate: 1 token ≈ 4 characters
8print(f"Estimated tokens: {estimated_tokens}")
9## Output: Estimated tokens: 12,4501## After a long conversation
2print(f"Messages in history: {len(agent.conversation_history)}")
3## Output: 156 messages
4
5## Estimate token count (rough approximation)
6total_chars = sum(len(str(msg)) for msg in agent.conversation_history)
7estimated_tokens = total_chars // 4 # Rough estimate: 1 token ≈ 4 characters
8print(f"Estimated tokens: {estimated_tokens}")
9## Output: Estimated tokens: 12,450For a long conversation, you might be sending thousands of tokens with each request. Most of those tokens are old messages that might not be relevant anymore.
Solution 1: Sliding Window Memory
The simplest solution: only keep the most recent N messages. This is called a "sliding window" because you keep a fixed-size window that slides forward as the conversation progresses.
1class ConversationalAgentWithWindow:
2 """Agent with sliding window memory."""
3
4 def __init__(self, api_key: str, max_messages: int = 20):
5 self.client = anthropic.Anthropic(api_key=api_key)
6 self.conversation_history = []
7 self.max_messages = max_messages
8
9 def chat(self, user_message: str) -> str:
10 """Chat with sliding window memory."""
11 # Add user message
12 self.conversation_history.append({
13 "role": "user",
14 "content": user_message
15 })
16
17 # Keep only recent messages
18 if len(self.conversation_history) > self.max_messages:
19 # Keep the most recent max_messages
20 self.conversation_history = self.conversation_history[-self.max_messages:]
21
22 # Get response with limited history
23 response = self.client.messages.create(
24 model="claude-sonnet-4.5",
25 max_tokens=1024,
26 messages=self.conversation_history
27 )
28
29 assistant_message = response.content[0].text
30
31 # Add response to history
32 self.conversation_history.append({
33 "role": "assistant",
34 "content": assistant_message
35 })
36
37 # Trim again if needed
38 if len(self.conversation_history) > self.max_messages:
39 self.conversation_history = self.conversation_history[-self.max_messages:]
40
41 return assistant_message1class ConversationalAgentWithWindow:
2 """Agent with sliding window memory."""
3
4 def __init__(self, api_key: str, max_messages: int = 20):
5 self.client = anthropic.Anthropic(api_key=api_key)
6 self.conversation_history = []
7 self.max_messages = max_messages
8
9 def chat(self, user_message: str) -> str:
10 """Chat with sliding window memory."""
11 # Add user message
12 self.conversation_history.append({
13 "role": "user",
14 "content": user_message
15 })
16
17 # Keep only recent messages
18 if len(self.conversation_history) > self.max_messages:
19 # Keep the most recent max_messages
20 self.conversation_history = self.conversation_history[-self.max_messages:]
21
22 # Get response with limited history
23 response = self.client.messages.create(
24 model="claude-sonnet-4.5",
25 max_tokens=1024,
26 messages=self.conversation_history
27 )
28
29 assistant_message = response.content[0].text
30
31 # Add response to history
32 self.conversation_history.append({
33 "role": "assistant",
34 "content": assistant_message
35 })
36
37 # Trim again if needed
38 if len(self.conversation_history) > self.max_messages:
39 self.conversation_history = self.conversation_history[-self.max_messages:]
40
41 return assistant_messageWith a sliding window of 20 messages, the agent remembers the last 10 exchanges (10 user messages + 10 assistant responses). Older messages are discarded.
This works well for:
- Casual conversations where old context isn't needed
- Cost-sensitive applications
- Very long conversations that would exceed context limits
The tradeoff: the agent forgets older parts of the conversation. If you reference something from 15 exchanges ago, the agent won't remember it.
Solution 2: Conversation Summarization
A more sophisticated approach: periodically summarize old messages and replace them with the summary. This preserves important information while reducing token count.
1class ConversationalAgentWithSummary:
2 """Agent with conversation summarization."""
3
4 def __init__(self, api_key: str, summarize_after: int = 20):
5 self.client = anthropic.Anthropic(api_key=api_key)
6 self.conversation_history = []
7 self.summarize_after = summarize_after
8
9 def _summarize_history(self, messages: list) -> str:
10 """Create a summary of conversation history."""
11 # Format the conversation
12 conversation_text = "\n".join([
13 f"{msg['role'].title()}: {msg['content']}"
14 for msg in messages
15 ])
16
17 # Ask the model to summarize
18 response = self.client.messages.create(
19 model="claude-sonnet-4.5",
20 max_tokens=500,
21 messages=[{
22 "role": "user",
23 "content": f"""Summarize this conversation, preserving key facts,
24 decisions, and context that might be referenced later:
25
26 {conversation_text}
27
28 Provide a concise summary in 2-3 paragraphs."""
29 }]
30 )
31
32 return response.content[0].text
33
34 def chat(self, user_message: str) -> str:
35 """Chat with automatic summarization."""
36 # Add user message
37 self.conversation_history.append({
38 "role": "user",
39 "content": user_message
40 })
41
42 # Check if we should summarize
43 if len(self.conversation_history) > self.summarize_after:
44 # Take first half of messages to summarize
45 to_summarize = self.conversation_history[:self.summarize_after // 2]
46 keep_recent = self.conversation_history[self.summarize_after // 2:]
47
48 # Create summary
49 summary = self._summarize_history(to_summarize)
50
51 # Replace old messages with summary
52 self.conversation_history = [
53 {
54 "role": "user",
55 "content": f"[Previous conversation summary: {summary}]"
56 }
57 ] + keep_recent
58
59 # Get response
60 response = self.client.messages.create(
61 model="claude-sonnet-4.5",
62 max_tokens=1024,
63 messages=self.conversation_history
64 )
65
66 assistant_message = response.content[0].text
67
68 # Add response
69 self.conversation_history.append({
70 "role": "assistant",
71 "content": assistant_message
72 })
73
74 return assistant_message1class ConversationalAgentWithSummary:
2 """Agent with conversation summarization."""
3
4 def __init__(self, api_key: str, summarize_after: int = 20):
5 self.client = anthropic.Anthropic(api_key=api_key)
6 self.conversation_history = []
7 self.summarize_after = summarize_after
8
9 def _summarize_history(self, messages: list) -> str:
10 """Create a summary of conversation history."""
11 # Format the conversation
12 conversation_text = "\n".join([
13 f"{msg['role'].title()}: {msg['content']}"
14 for msg in messages
15 ])
16
17 # Ask the model to summarize
18 response = self.client.messages.create(
19 model="claude-sonnet-4.5",
20 max_tokens=500,
21 messages=[{
22 "role": "user",
23 "content": f"""Summarize this conversation, preserving key facts,
24 decisions, and context that might be referenced later:
25
26 {conversation_text}
27
28 Provide a concise summary in 2-3 paragraphs."""
29 }]
30 )
31
32 return response.content[0].text
33
34 def chat(self, user_message: str) -> str:
35 """Chat with automatic summarization."""
36 # Add user message
37 self.conversation_history.append({
38 "role": "user",
39 "content": user_message
40 })
41
42 # Check if we should summarize
43 if len(self.conversation_history) > self.summarize_after:
44 # Take first half of messages to summarize
45 to_summarize = self.conversation_history[:self.summarize_after // 2]
46 keep_recent = self.conversation_history[self.summarize_after // 2:]
47
48 # Create summary
49 summary = self._summarize_history(to_summarize)
50
51 # Replace old messages with summary
52 self.conversation_history = [
53 {
54 "role": "user",
55 "content": f"[Previous conversation summary: {summary}]"
56 }
57 ] + keep_recent
58
59 # Get response
60 response = self.client.messages.create(
61 model="claude-sonnet-4.5",
62 max_tokens=1024,
63 messages=self.conversation_history
64 )
65
66 assistant_message = response.content[0].text
67
68 # Add response
69 self.conversation_history.append({
70 "role": "assistant",
71 "content": assistant_message
72 })
73
74 return assistant_messageThis approach:
- Keeps recent messages in full detail
- Summarizes older messages to preserve important context
- Reduces token count while maintaining continuity
The tradeoff: summarization costs an extra API call, and some details might be lost in the summary.
Choosing a Memory Strategy
Which approach should you use? It depends on your use case:
Full history (no limits):
- Best for: Short conversations, when cost isn't a concern
- Pros: Perfect memory, no information loss
- Cons: Expensive for long conversations, can hit context limits
Sliding window:
- Best for: Casual chat, when only recent context matters
- Pros: Simple, predictable cost, never hits context limits
- Cons: Forgets older information completely
Summarization:
- Best for: Long conversations where older context matters
- Pros: Preserves important information, manages token count
- Cons: More complex, costs extra for summarization, might lose details
For our personal assistant, a hybrid approach often works best:
1class SmartConversationalAgent:
2 """Agent with intelligent memory management."""
3
4 def __init__(self, api_key: str,
5 window_size: int = 30,
6 summarize_threshold: int = 50):
7 self.client = anthropic.Anthropic(api_key=api_key)
8 self.conversation_history = []
9 self.window_size = window_size
10 self.summarize_threshold = summarize_threshold
11 self.summary = None
12
13 def chat(self, user_message: str) -> str:
14 """Chat with smart memory management."""
15 # Add message
16 self.conversation_history.append({
17 "role": "user",
18 "content": user_message
19 })
20
21 # Manage memory
22 if len(self.conversation_history) > self.summarize_threshold:
23 # Summarize old messages
24 old_messages = self.conversation_history[:-self.window_size]
25 self.summary = self._summarize(old_messages)
26 self.conversation_history = self.conversation_history[-self.window_size:]
27
28 # Build context
29 messages = []
30 if self.summary:
31 messages.append({
32 "role": "user",
33 "content": f"[Context from earlier: {self.summary}]"
34 })
35 messages.extend(self.conversation_history)
36
37 # Get response
38 response = self.client.messages.create(
39 model="claude-sonnet-4.5",
40 max_tokens=1024,
41 messages=messages
42 )
43
44 assistant_message = response.content[0].text
45 self.conversation_history.append({
46 "role": "assistant",
47 "content": assistant_message
48 })
49
50 return assistant_message1class SmartConversationalAgent:
2 """Agent with intelligent memory management."""
3
4 def __init__(self, api_key: str,
5 window_size: int = 30,
6 summarize_threshold: int = 50):
7 self.client = anthropic.Anthropic(api_key=api_key)
8 self.conversation_history = []
9 self.window_size = window_size
10 self.summarize_threshold = summarize_threshold
11 self.summary = None
12
13 def chat(self, user_message: str) -> str:
14 """Chat with smart memory management."""
15 # Add message
16 self.conversation_history.append({
17 "role": "user",
18 "content": user_message
19 })
20
21 # Manage memory
22 if len(self.conversation_history) > self.summarize_threshold:
23 # Summarize old messages
24 old_messages = self.conversation_history[:-self.window_size]
25 self.summary = self._summarize(old_messages)
26 self.conversation_history = self.conversation_history[-self.window_size:]
27
28 # Build context
29 messages = []
30 if self.summary:
31 messages.append({
32 "role": "user",
33 "content": f"[Context from earlier: {self.summary}]"
34 })
35 messages.extend(self.conversation_history)
36
37 # Get response
38 response = self.client.messages.create(
39 model="claude-sonnet-4.5",
40 max_tokens=1024,
41 messages=messages
42 )
43
44 assistant_message = response.content[0].text
45 self.conversation_history.append({
46 "role": "assistant",
47 "content": assistant_message
48 })
49
50 return assistant_messageThis combines both strategies: keep recent messages in full, summarize older ones.
Practical Considerations
As you build memory into your agent, keep these points in mind:
Start simple: Begin with full history. Only add complexity (windowing, summarization) when you actually need it.
Monitor costs: Track how many tokens you're sending. If costs are high, implement a sliding window.
Test edge cases: What happens when the user references something from 20 messages ago? Does your memory strategy handle it?
Consider the domain: A customer service bot might need full history. A casual chatbot might work fine with a small window.
Provide memory controls: Let users start fresh conversations when needed. Sometimes they want to change topics completely.
Key Takeaways
Let's review what we've learned about short-term conversation memory:
Memory is essential for conversation. Without it, agents can't handle follow-up questions or maintain context across turns.
Implementation is straightforward. Store messages in a list and send them with each new request. The model handles the rest.
Memory grows over time. Long conversations create large histories that cost more and can hit context limits.
Multiple strategies exist. Full history, sliding windows, and summarization each have their place.
Choose based on your needs. Consider conversation length, cost constraints, and how much context you need to preserve.
Your personal assistant now has short-term memory. It can maintain context across a conversation, handle follow-ups, and build on previous exchanges. This is a fundamental capability that makes agents feel natural and useful.
In the next chapter, we'll explore long-term memory: how to store information across sessions, remember user preferences, and retrieve relevant facts from a knowledge base. Combined with short-term memory, this will give your agent a complete memory system.
Glossary
Conversation History: The list of messages exchanged between the user and agent, stored in order and sent with each new request to provide context.
Stateless Interaction: A request-response pattern where each interaction is independent, with no memory of previous exchanges.
Sliding Window Memory: A memory strategy that keeps only the most recent N messages, discarding older ones to manage context size and cost.
Context Window: The maximum number of tokens a language model can process in a single request, including both the conversation history and the new message.
Conversation Summarization: A technique that condenses older messages into a brief summary to preserve important context while reducing token count.
Token: The basic unit of text that language models process, roughly equivalent to a word or word fragment. Models charge per token and have maximum token limits.
Quiz
Ready to test your understanding? Take this quick quiz to reinforce what you've learned about short-term conversation memory in AI agents.
Reference

About the author: Michael Brenndoerfer
All opinions expressed here are my own and do not reflect the views of my employer.
Michael currently works as an Associate Director of Data Science at EQT Partners in Singapore, where he drives AI and data initiatives across private capital investments.
With over a decade of experience spanning private equity, management consulting, and software engineering, he specializes in building and scaling analytics capabilities from the ground up. He has published research in leading AI conferences and holds expertise in machine learning, natural language processing, and value creation through data.
Related Content

Scaling Up without Breaking the Bank: AI Agent Performance & Cost Optimization at Scale
Learn how to scale AI agents from single users to thousands while maintaining performance and controlling costs. Covers horizontal scaling, load balancing, monitoring, cost controls, and prompt optimization strategies.

Managing and Reducing AI Agent Costs: Complete Guide to Cost Optimization Strategies
Learn how to dramatically reduce AI agent API costs without sacrificing capability. Covers model selection, caching, batching, prompt optimization, and budget controls with practical Python examples.

Speeding Up AI Agents: Performance Optimization Techniques for Faster Response Times
Learn practical techniques to make AI agents respond faster, including model selection strategies, response caching, streaming, parallel execution, and prompt optimization for reduced latency.
Stay updated
Get notified when I publish new articles on data and AI, private equity, technology, and more.

