Learn how to give AI agents the ability to remember recent conversations, handle follow-up questions, and manage conversation history across multiple interactions.

This article is part of the free-to-read AI Agent Handbook
Short-Term Conversation Memory
You've built an agent that can reason, use tools, and solve complex problems. But there's a fundamental limitation: it forgets everything after each interaction. Ask it a question, get an answer, then ask a follow-up, and it has no idea what you're talking about.
Imagine calling a help desk where the representative forgets your conversation every 30 seconds. You'd have to re-explain your problem constantly. Frustrating, right? That's exactly what happens with an agent that lacks memory.
In this chapter, we'll give your personal assistant the ability to remember recent conversations. You'll learn how to maintain context across multiple interactions, handle follow-up questions, and manage conversation history as it grows. By the end, your agent will feel less like a stateless question-answering machine and more like an assistant that actually remembers what you've been discussing.
The Problem: Stateless Interactions
Let's see what happens without memory. Here's a conversation with our agent:
The agent answered the first question perfectly. But when you asked a follow-up ("What's the population?"), it had no memory of discussing France. Each interaction is isolated, like talking to someone with amnesia.
This breaks down quickly in real conversations. People naturally ask follow-up questions, refer to previous topics, and build on earlier context. Without memory, your agent can't handle these basic conversational patterns.
Example (Claude Sonnet 4.5)
Let's see this in code:
Q1: The capital of France is Paris.
Q2: I'd be happy to help you with population information! Could you please specify which location you're asking about? For example: - A specific city? - A country? - The world? - Some other area?
Each call to ask_without_memory creates a fresh conversation. The agent has no context from previous questions. This is the default behavior when you don't explicitly manage conversation history.
The Solution: Conversation History
The fix is straightforward: keep track of the conversation and send it with each new message. Instead of just sending the latest question, you send the entire dialogue history.
Here's what that looks like:
Now let's try the same conversation with memory:
Q1: The capital of France is Paris.
Q2: The population of Paris is approximately 2.2 million people within the city limits (as of recent estimates). However, the Paris metropolitan area (Île-de-France region) has a much larger population of around 12-13 million people, making it one of the largest metropolitan areas in Europe.
The agent understood "the population" refers to Paris because it remembered the previous exchange. This is the foundation of conversational AI: maintaining context across turns.
How Conversation History Works
Let's look at what's actually happening behind the scenes. After the two-question exchange above, our history list contains:
[{'role': 'user', 'content': "What's the capital of France?"},
{'role': 'assistant', 'content': 'The capital of France is Paris.'},
{'role': 'user', 'content': "What's the population?"},
{'role': 'assistant',
'content': 'Paris has a population of approximately...'}]Each message is stored with its role (either "user" or "assistant") and content. When you ask a new question, the entire list goes to the model. The model sees the full conversation thread and can reference earlier messages.
Think of it like showing someone a chat transcript. They can read the whole conversation and understand the context of the latest message. That's exactly what we're doing with the language model.
Building a Conversational Agent
Let's create a more complete conversational agent that manages memory automatically:
Now we can have natural, multi-turn conversations:
# How exciting! I'd be happy to help you plan your trip to Japan. To give you the best suggestions, it would be helpful to know more about your plans: - **When are you planning to go?** (Season can greatly affect your experience - cherry blossoms in spring, fall foliage, winter snow, summer festivals) - **How long** will you be visiting? - **What are your interests?** (temples/culture, food, nature, cities, anime/pop culture, hiking, etc.) - **First time to Japan** or returning? - **Which cities/regions** are you considering? In the meantime, here are some general tips: **Popular first-timer destinations:** - Tokyo (modern city life) - Kyoto (traditional culture, temples) - Osaka (food scene) - Hiroshima & Miyajima - Hakone or nearby Mt. Fuji areas **Practical considerations:** - JR Pass can save money if traveling between cities - Pocket WiFi or SIM card is very useful - IC cards (Suica/Pasmo) make transport easy - Learn a few basic Japanese phrases What aspects of your trip would you like help with?
# Best Times to Visit Japan The "best" time really depends on what you're looking for, but here's a breakdown: ## 🌸 **Spring (March-May) - Most Popular** **Pros:** - Cherry blossoms (late March-early April) - truly spectacular - Comfortable temperatures (15-20°C / 59-68°F) - Clear skies, low rainfall **Cons:** - Very crowded and expensive during sakura season - Book accommodations months in advance - Popular spots can feel overwhelming ## 🍁 **Autumn (September-November) - Also Peak Season** **Pros:** - Fall foliage (mid-November) is stunning - Pleasant weather (15-22°C / 59-72°F) - Comfortable for walking/sightseeing - Many festivals **Cons:** - Crowded, especially in Kyoto during koyo (leaf viewing) - Higher prices - Early September can still be hot/humid ## ❄️ **Winter (December-February) - Underrated** **Pros:** - Fewer tourists, lower prices - Excellent skiing in Hokkaido/Nagano - Beautiful snow scenery - Clear skies (especially in Tokyo) **Cons:** - Cold (0-10°C / 32-50°F) - Some attractions close - Shorter daylight hours ## ☀️ **Summer (June-August) - Least Recommended** **Pros:** - Vibrant festivals (fireworks, bon odori) - Summer hiking season in mountains - Lavender fields in Hokkaido **Cons:** - Hot and very humid (25-35°C / 77-95°F) - Rainy season in June-July - Typhoon season (Aug-Sept) - Can be uncomfortable for sightseeing ## My Recommendation: **Late March-April or November** for the best balance of weather and scenery, but be prepared for crowds. **Winter (Jan-Feb)** if you want a more budget-friendly, authentic experience. What type of experience appeals to you most?
# How Long to Stay in Japan Here's my breakdown based on different trip lengths: ## ⏱️ **Minimum: 7-10 Days** Best for first-timers with limited time - **Tokyo**: 3-4 days - **Kyoto**: 2-3 days - **Osaka**: 1-2 days - Day trip to Nara or Hakone This gives you a taste of modern Japan (Tokyo), traditional culture (Kyoto), and great food (Osaka). ## 👍 **Ideal: 2 Weeks (14 days)** The sweet spot for most visitors - **Tokyo**: 4-5 days (including day trips to Nikko, Kamakura, or Mt. Fuji area) - **Kyoto**: 3-4 days (including day trip to Nara) - **Osaka**: 2 days - **Hiroshima**: 1-2 days (including Miyajima island) - **Plus** one wild card: Takayama, Kanazawa, Hakone, or Nagano This allows you to experience diverse regions without feeling rushed. ## 🌟 **Extended: 3-4 Weeks** For thorough exploration - All of the above PLUS: - **Hokkaido** (Sapporo, Hakodate) - **Japanese Alps** region - **Okinawa** (tropical islands) - More rural/off-beaten-path areas - Time to slow down and absorb local life ## 💡 **My Recommendation:** **Aim for at least 10-14 days** if possible. Japan has: - Excellent public transportation (easy to cover ground) - Enough variety that you won't get bored - A learning curve (language, customs) that takes a few days to adjust to **Quality over quantity:** It's better to spend 3 days truly experiencing Kyoto than rushing through in 1 day to check it off a list. How much time do you have available for your trip?
# Essential Cities & Regions for Your Japan Itinerary Here's my guide organized by priority: ## 🎯 **The Classic "Golden Route" (Must-See for First-Timers)** ### **Tokyo** (4-5 days) - Modern metropolis, endless neighborhoods to explore - Shibuya, Shinjuku, Harajuku (urban energy) - Asakusa (traditional temples) - Akihabara (anime/electronics) - Tsukiji Outer Market (food) - **Day trips:** Mt. Fuji/Hakone, Nikko, Kamakura ### **Kyoto** (3-4 days) - Japan's cultural heart, 2,000+ temples - Fushimi Inari (iconic red torii gates) - Arashiyama bamboo grove - Kinkaku-ji (Golden Pavilion) - Gion district (geisha spotting) - **Easy day trip:** Nara (deer park, giant Buddha - do this!) ### **Osaka** (1-2 days) - Food capital of Japan - Street food in Dotonbori - Osaka Castle - More casual, fun vibe than Tokyo - Great nightlife ## 🔥 **Highly Recommended Add-Ons** ### **Hiroshima** (1-2 days) - Peace Memorial & Museum (moving experience) - **Miyajima Island** - one of Japan's most scenic spots (floating torii gate) ### **Hakone** (1-2 days) - Mt. Fuji views - Hot springs (onsen) - Art museums - Scenic nature - Easy from Tokyo ## 🌟 **If You Have Extra Time (Pick 1-2)** ### **Takayama & Shirakawa-go** - Traditional mountain villages - Preserved Edo-period streets - UNESCO World Heritage gassho-zukuri houses ### **Kanazawa** - Beautiful Kenrokuen Garden - Samurai & geisha districts - Excellent seafood - Less touristy alternative to Kyoto ### **Nara** - Can be a day trip from Kyoto/Osaka - Friendly deer roaming freely - Todai-ji Temple (massive Buddha) ### **Nagano** - Snow monkeys in hot springs - Mountain temples - Winter sports ### **Hokkaido** (Sapporo, Hakodate) - Northern island - very different feel - Best in winter (skiing, snow festival) or summer (lavender) - Needs 3-5 days minimum ### **Okinawa** - Tropical islands, beaches - Distinct Ryukyu culture - Requires flight, feels like different country ## 📋 **Sample Itineraries** ### **10 Days:** - Tokyo (4 days) → Hakone (1 day) → Kyoto (3 days, with Nara day trip) → Osaka (2 days) ### **14 Days:** - Tokyo (4 days) → Hakone (1 day) → Kyoto (3 days) → Nara (1 day) → Osaka (2 days) → Hiroshima/Miyajima (2 days) → back to Tokyo or add Takayama (1 day) ### **3 Weeks:** - All of the above + Kanazawa (2 days) + Nagano (2 days) + Hokkaido (4-5 days) OR Okinawa (3-4 days) ## 💡 **My Advice:** **Don't try to see everything!** Japan rewards depth over breadth. The classic Tokyo-Kyoto-Osaka triangle is perfect for a first visit. You can always return (and you'll want to!). **Geographic logic matters:** Plan your route to avoid backtracking. Generally flow: Tokyo → Central Japan → Kyoto/Osaka → Western Japan (Hiroshima). What's your trip length looking like? That will help me suggest the best combination!
Notice how each response builds on the previous context. The agent remembers we're discussing a Japan trip and tailors its answers accordingly.
Handling Follow-Up Questions
One of the most powerful aspects of conversation memory is handling follow-ups. People rarely ask perfectly self-contained questions. They say "What about that?", "Can you explain more?", or "Why is that?"
Let's see this in action:
# Main Differences Between Python and JavaScript
## **1. Primary Use Cases**
- **Python**: General-purpose programming, data science, AI/ML, automation, backend development
- **JavaScript**: Web development (frontend), also backend (Node.js), interactive web pages
## **2. Execution Environment**
- **Python**: Runs on servers, desktops, requires Python interpreter
- **JavaScript**: Runs natively in web browsers, also on servers via Node.js
## **3. Syntax Style**
- **Python**: Uses indentation for code blocks, more readable
```python
if x > 5:
print("Greater than 5")
```
- **JavaScript**: Uses curly braces and semicolons
```javascript
if (x > 5) {
console.log("Greater than 5");
}
```
## **4. Typing**
- **Python**: Dynamically typed, supports optional type hints
- **JavaScript**: Dynamically typed, can use TypeScript for static typing
## **5. Data Structures**
- **Python**: Lists, tuples, dictionaries, sets
- **JavaScript**: Arrays, objects (no native tuples/sets until ES6)
## **6. Object-Oriented Programming**
- **Python**: Class-based with clear inheritance
- **JavaScript**: Prototype-based (though ES6+ added class syntax)
## **7. Concurrency**
- **Python**: Threading, multiprocessing, asyncio
- **JavaScript**: Event-driven, single-threaded with async/await and callbacks
Both are powerful, versatile languages—choose based on your project needs!
# Which Is Easier to Learn?
## **Python is Generally Considered Easier** 🐍
### **Why Python Wins for Beginners:**
**1. Simpler, Cleaner Syntax**
- Reads almost like English
- No curly braces or semicolons to remember
- Indentation forces clean, readable code
```python
# Python - straightforward
name = "Alice"
if name == "Alice":
print("Hello Alice!")
```
```javascript
// JavaScript - more symbols
let name = "Alice";
if (name === "Alice") {
console.log("Hello Alice!");
}
```
**2. Fewer Quirks**
- More predictable behavior
- Less confusing concepts (like `this`, hoisting, `==` vs `===`)
**3. One Clear Way to Do Things**
- Python philosophy: "There should be one obvious way to do it"
- JavaScript has multiple paradigms that can confuse beginners
**4. Better for Learning Programming Concepts**
- Used widely in education
- Great for algorithms, data structures, and general CS concepts
## **JavaScript Has Its Advantages:**
✅ Instant gratification (see results in browser immediately)
✅ No installation needed (just open browser console)
✅ Essential if you want to build websites
## **Verdict:**
**Start with Python** if you're learning programming fundamentals. Switch to JavaScript when you're ready for web development. Many developers learn both eventually!
# Why Is Python Easier to Learn?
Let me break down the specific reasons:
## **1. Less Cognitive Load**
**Python:**
```python
# Clean and minimal
numbers = [1, 2, 3, 4, 5]
for num in numbers:
print(num * 2)
```
**JavaScript:**
```javascript
// More syntax to remember
const numbers = [1, 2, 3, 4, 5];
numbers.forEach(function(num) {
console.log(num * 2);
});
```
Python requires remembering fewer symbols and conventions.
## **2. JavaScript's Confusing Behaviors**
**Type coercion surprises:**
```javascript
"5" + 3 // "53" (string)
"5" - 3 // 2 (number) - WHY?!
[] + [] // "" (empty string)
[] + {} // "[object Object]"
```
**Multiple equality operators:**
```javascript
0 == "0" // true (weird!)
0 === "0" // false (expected)
```
Python is more predictable and consistent.
## **3. Explicit vs Implicit**
- **Python**: Explicit is better than implicit (Zen of Python)
- **JavaScript**: Lots of implicit behavior (automatic semicolons, type conversion, hoisting)
## **4. Focused Learning Path**
- **Python**: One obvious way → easier decisions for beginners
- **JavaScript**: Multiple ways (callbacks, promises, async/await, classes, prototypes) → overwhelming choices
## **The Bottom Line:**
Python was **designed for readability and simplicity**. JavaScript evolved from a quick browser scripting language and accumulated complexity over 25+ years. Both are powerful, but Python's philosophy makes it friendlier for newcomers.
# Async Programming Explained
## **What Is Async Programming?**
**Asynchronous** programming lets your code do multiple things without waiting for slow operations to finish.
### **Real-World Analogy:**
**Synchronous (blocking):**
- You order coffee ☕
- Stand there waiting while it's made ⏳
- Only then can you order a bagel 🥯
- Wait again...
**Asynchronous (non-blocking):**
- Order coffee ☕
- While it's being made, order a bagel 🥯
- While both are being made, sit down and check your phone 📱
- Get notified when each is ready ✅
## **Why Do We Need It?**
For slow operations like:
- Fetching data from websites/APIs
- Reading files
- Database queries
- Waiting for user input
Without async, your entire program freezes!
## **Python Example:**
```python
import asyncio
# Synchronous - takes 6 seconds total
def make_coffee():
time.sleep(3) # Wait 3 seconds
return "Coffee ready"
def make_toast():
time.sleep(3) # Wait 3 seconds
return "Toast ready"
# Asynchronous - takes 3 seconds total!
async def make_coffee_async():
await asyncio.sleep(3)
return "Coffee ready"
async def make_toast_async():
await asyncio.sleep(3)
return "Toast ready"
async def make_breakfast():
# Both happen at the same time!
coffee, toast = await asyncio.gather(
make_coffee_async(),
make_toast_async()
)
print(coffee, toast)
```
## **JavaScript Example:**
```javascript
// Old way - Callback Hell
fetchUser(function(user) {
fetchPosts(user.id, function(posts) {
fetchComments(posts[0].id, function(comments) {
// Finally do something...
});
});
});
// Modern way - async/await (much cleaner!)
async function getUserData() {
const user = await fetchUser();
const posts = await fetchPosts(user.id);
const comments = await fetchComments(posts[0].id);
return comments;
}
```
## **Key Concepts:**
🔹 **await** - "Pause here until this finishes, but let other code run"
🔹 **async** - "This function uses await inside"
🔹 **Promise** (JS) - "I promise to give you a result later"
🔹 **Non-blocking** - Other code keeps running while waiting
## **When to Use Async:**
✅ API calls
✅ File operations
✅ Database queries
✅ Any I/O (input/output) operations
❌ Heavy calculations (use different techniques like threading)
Async makes your programs faster and more responsive!
Each follow-up question would be impossible to answer without context. "Which one" refers to Python and JavaScript. "Why is that" refers to Python being easier. "You mentioned" explicitly references the earlier response.
The agent handles all of this naturally because it has the full conversation in memory.
Memory with Tool Use
Memory becomes even more important when your agent uses tools. The agent needs to remember:
- What tools it has called
- What results it received
- How those results relate to the user's questions
Let's extend our conversational agent to support tools:
Now watch how memory works with tools:
1234 times 5678 equals **7,006,652**.
If you double that, you get **14,013,304**.
10% of the original number (7,006,652) is **700,665.2**.
The agent remembers both the conversation and the tool results. "Double that" refers to the previous calculation. "The original number" refers to the first result, not the doubled one.
This is powerful. The agent can build on its own work, reference previous calculations, and maintain context across multiple tool uses.
The Context Window Challenge
Here's a problem: conversation history grows with every exchange. After 50 turns, you're sending 100 messages (50 user, 50 assistant) with every new question. This creates two issues:
Cost: Most API providers charge per token. Sending the entire history every time gets expensive.
Context limits: Models have maximum context windows. Claude Sonnet 4.5 supports 200K tokens, but you'll eventually hit limits in very long conversations.
Let's see this problem in practice:
Messages in history: 12 Estimated tokens: 341
For a long conversation, you might be sending thousands of tokens with each request. Most of those tokens are old messages that might not be relevant anymore.
Solution 1: Sliding Window Memory
The simplest solution: only keep the most recent N messages. This is called a "sliding window" because you keep a fixed-size window that slides forward as the conversation progresses.
With a sliding window of 20 messages, the agent remembers the last 10 exchanges (10 user messages + 10 assistant responses). Older messages are discarded.
This works well for:
- Casual conversations where old context isn't needed
- Cost-sensitive applications
- Very long conversations that would exceed context limits
The tradeoff: the agent forgets older parts of the conversation. If you reference something from 15 exchanges ago, the agent won't remember it.
Solution 2: Conversation Summarization
A more sophisticated approach: periodically summarize old messages and replace them with the summary. This preserves important information while reducing token count.
This approach:
- Keeps recent messages in full detail
- Summarizes older messages to preserve important context
- Reduces token count while maintaining continuity
The tradeoff: summarization costs an extra API call, and some details might be lost in the summary.
Choosing a Memory Strategy
Which approach should you use? It depends on your use case:
Full history (no limits):
- Best for: Short conversations, when cost isn't a concern
- Pros: Perfect memory, no information loss
- Cons: Expensive for long conversations, can hit context limits
Sliding window:
- Best for: Casual chat, when only recent context matters
- Pros: Simple, predictable cost, never hits context limits
- Cons: Forgets older information completely
Summarization:
- Best for: Long conversations where older context matters
- Pros: Preserves important information, manages token count
- Cons: More complex, costs extra for summarization, might lose details
For our personal assistant, a hybrid approach often works best:
This combines both strategies: keep recent messages in full, summarize older ones.
Practical Considerations
As you build memory into your agent, keep these points in mind:
Start simple: Begin with full history. Only add complexity (windowing, summarization) when you actually need it.
Monitor costs: Track how many tokens you're sending. If costs are high, implement a sliding window.
Test edge cases: What happens when the user references something from 20 messages ago? Does your memory strategy handle it?
Consider the domain: A customer service bot might need full history. A casual chatbot might work fine with a small window.
Provide memory controls: Let users start fresh conversations when needed. Sometimes they want to change topics completely.
Key Takeaways
Let's review what we've learned about short-term conversation memory:
Memory is essential for conversation. Without it, agents can't handle follow-up questions or maintain context across turns.
Implementation is straightforward. Store messages in a list and send them with each new request. The model handles the rest.
Memory grows over time. Long conversations create large histories that cost more and can hit context limits.
Multiple strategies exist. Full history, sliding windows, and summarization each have their place.
Choose based on your needs. Consider conversation length, cost constraints, and how much context you need to preserve.
Your personal assistant now has short-term memory. It can maintain context across a conversation, handle follow-ups, and build on previous exchanges. This is a fundamental capability that makes agents feel natural and useful.
In the next chapter, we'll explore long-term memory: how to store information across sessions, remember user preferences, and retrieve relevant facts from a knowledge base. Combined with short-term memory, this will give your agent a complete memory system.
Glossary
Conversation History: The list of messages exchanged between the user and agent, stored in order and sent with each new request to provide context.
Stateless Interaction: A request-response pattern where each interaction is independent, with no memory of previous exchanges.
Sliding Window Memory: A memory strategy that keeps only the most recent N messages, discarding older ones to manage context size and cost.
Context Window: The maximum number of tokens a language model can process in a single request, including both the conversation history and the new message.
Conversation Summarization: A technique that condenses older messages into a brief summary to preserve important context while reducing token count.
Token: The basic unit of text that language models process, roughly equivalent to a word or word fragment. Models charge per token and have maximum token limits.
Quiz
Ready to test your understanding? Take this quick quiz to reinforce what you've learned about short-term conversation memory in AI agents.
Reference

About the author: Michael Brenndoerfer
All opinions expressed here are my own and do not reflect the views of my employer.
Michael currently works as an Associate Director of Data Science at EQT Partners in Singapore, where he drives AI and data initiatives across private capital investments.
With over a decade of experience spanning private equity, management consulting, and software engineering, he specializes in building and scaling analytics capabilities from the ground up. He has published research in leading AI conferences and holds expertise in machine learning, natural language processing, and value creation through data.
Related Content

Skip-gram Model: Learning Word Embeddings by Predicting Context
A comprehensive guide to the Skip-gram model from Word2Vec, covering architecture, objective function, training data generation, and implementation from scratch.

Implementing Memory in Our Agent: Building a Complete Personal Assistant with Short-Term and Long-Term Memory
Learn how to build a complete AI agent memory system combining conversation history and persistent knowledge storage. Includes semantic search, tool integration, and practical implementation patterns.

Long-Term Knowledge Storage and Retrieval: Building Persistent Memory for AI Agents
Learn how AI agents store and retrieve information across sessions using vector databases, embeddings, and semantic search. Build a personal assistant that remembers facts, preferences, and knowledge long-term.
Stay updated
Get notified when I publish new articles on data and AI, private equity, technology, and more.


Comments