Learn how to build a complete AI agent memory system combining conversation history and persistent knowledge storage. Includes semantic search, tool integration, and practical implementation patterns.

This article is part of the free-to-read AI Agent Handbook
Toggle tooltip visibility. Hover over underlined terms for instant definitions.
Implementing Memory in Our Agent
You've learned the concepts: short-term memory for conversations and long-term memory for persistent knowledge. Now let's build a complete personal assistant that combines both. We'll start with a minimal implementation and gradually add sophistication, showing you exactly how memory works in practice.
By the end of this chapter, you'll have a working assistant that remembers conversations, stores important facts, and retrieves relevant information when needed. More importantly, you'll understand the design decisions and trade-offs involved in building memory systems for AI agents.
Starting Point: A Complete Memory System
Let's build our assistant step by step. We'll start with the core components and then assemble them into a working system.
The Conversation Manager
First, we need something to handle short-term memory. This manages the ongoing conversation:
This is straightforward. We maintain a list of messages and automatically trim it when it gets too long. The sliding window approach keeps costs predictable and prevents context overflow.
The Knowledge Store
Next, we need long-term memory. This stores facts persistently:
This provides persistent storage with simple keyword search. We save facts to a JSON file so they survive between sessions. Each fact has a timestamp and optional category for organization.
Putting It Together
Now let's combine these into a complete assistant. Example (Claude Sonnet 4.5):
Let's see this in action:
Notice how the assistant:
- Stores facts when you use memory keywords
- Retrieves relevant facts when answering questions
- Combines stored knowledge with its language model capabilities
- Maintains conversation context across multiple turns
This is a complete, working memory system. But we can make it better.
Improving Retrieval with Semantic Search
The keyword search works, but it misses semantically related information. If you ask "What foods should I avoid?", it won't find your peanut allergy unless the word "avoid" appears in the stored fact.
Let's upgrade to semantic search using embeddings:
Now replace KnowledgeStore with SemanticKnowledgeStore in the PersonalAssistant class:
Let's test the improved search:
The semantic search understands meaning, not just keywords. "What foods should I avoid?" finds your allergy information even though "avoid" doesn't appear in the stored fact.
Adding Tool Use with Memory
Let's extend our assistant to use tools while maintaining memory. We'll add a calculator tool as an example:
Now watch memory and tools work together:
The assistant combines three capabilities:
- Long-term memory: Retrieves your savings goal
- Short-term memory: Remembers the previous calculation
- Tool use: Performs accurate calculations
This is powerful. The agent can reference stored facts, maintain conversation context, and use tools to solve problems it couldn't handle with language alone.
Managing Memory Over Time
As your assistant accumulates facts, you'll need ways to manage them. Let's add some utilities:
Now you can manage stored knowledge:
These management functions give users control over their data. This is important for privacy and accuracy.
Design Decisions and Trade-offs
Let's discuss the choices we made and their implications:
Sliding Window for Conversations
We limit conversation history to 20 messages. This:
Pros:
- Keeps costs predictable
- Prevents context overflow
- Simple to implement
Cons:
- Loses older conversation context
- Might forget important details from earlier in the session
Alternative: Use summarization instead of hard truncation. Periodically summarize old messages and keep the summary.
Keyword Detection for Memory Commands
We detect "remember", "store", etc. to trigger memory storage. This:
Pros:
- Simple and fast
- No extra API calls
- User has explicit control
Cons:
- Might miss implicit memory requests
- Requires specific keywords
Alternative: Use the LLM to classify every message as "store this" or "just chat". More flexible but costs more.
Semantic Search with Local Embeddings
We use a local embedding model for semantic search. This:
Pros:
- Fast and free
- Good enough for most use cases
- No API calls for every search
Cons:
- Less powerful than API-based embeddings
- Requires installing additional libraries
Alternative: Use OpenAI's embedding API or Anthropic's future embedding service. Better quality but costs money.
JSON File Storage
We store facts in a JSON file. This:
Pros:
- Simple to implement
- No database setup required
- Easy to inspect and debug
Cons:
- Doesn't scale to thousands of facts
- No concurrent access support
- Limited query capabilities
Alternative: Use a proper database (SQLite for local, PostgreSQL for production) or a vector database (Pinecone, Weaviate, Chroma).
When to Use What
Here's a practical guide for choosing memory strategies:
For personal projects or prototypes:
- Use the simple JSON-based approach we've shown
- Sliding window for conversation history
- Local embeddings for semantic search
For production applications with <1000 users:
- SQLite or PostgreSQL for knowledge storage
- Consider Redis for conversation history (fast, ephemeral)
- Still use local embeddings if cost is a concern
For large-scale applications:
- Vector database (Pinecone, Weaviate, Chroma) for knowledge
- Redis or similar for conversation state
- API-based embeddings for best quality
- Implement proper user isolation and security
Testing Your Memory System
How do you know if your memory system works well? Here are some tests:
These tests verify:
- Facts are stored and retrieved correctly
- Semantic search finds relevant information
- Conversation context is maintained
- Data persists across sessions
What You've Built
Let's appreciate what you now have:
A complete memory system with both short-term and long-term components working together.
Semantic search that understands meaning, not just keywords.
Tool integration where memory enhances tool use.
Memory management giving users control over their data.
Persistent storage that survives between sessions.
This is a real, working personal assistant. You can extend it with more tools, better storage, or additional capabilities. The foundation is solid.
Practical Considerations
As you deploy your memory system, keep these points in mind:
Privacy matters: You're storing personal information. Encrypt sensitive data, provide ways to export or delete it, and be transparent about what you store.
Memory can be wrong: Users might tell you incorrect information or change their minds. Provide ways to correct or update facts.
Not everything should be remembered: Some information is temporary or sensitive. Consider what truly needs long-term storage.
Test with real users: Your assumptions about what to remember might differ from what users actually need. Gather feedback.
Monitor costs: If using API-based embeddings or large context windows, track your spending. Optimize where needed.
Key Takeaways
Let's review what we've learned about implementing memory:
Memory has two layers: Short-term for conversations, long-term for persistent facts. Both are essential.
Start simple: A list for conversations and a JSON file for facts works fine for many applications.
Semantic search is powerful: Embeddings let you find information by meaning, making retrieval much more useful.
Memory enhances everything: When combined with tools and reasoning, memory makes your agent far more capable.
Design for users: Provide ways to view, update, and delete stored information. Users should control their data.
Your personal assistant now has a complete memory system. It remembers conversations, stores important facts, retrieves relevant information, and combines all of this with language model capabilities and tool use. This is a significant milestone in building truly useful AI agents.
In the next chapter, we'll explore how to organize all these components into a coherent agent architecture, showing how memory, reasoning, tools, and state management work together in a unified system.
Glossary
Conversation Manager: A component that handles short-term memory by storing and managing the recent message history in a conversation.
Knowledge Store: A persistent storage system for long-term facts and information that survives across sessions.
Semantic Search: Finding information based on meaning rather than exact keyword matches, typically using embeddings and similarity calculations.
Sliding Window: A memory management strategy that keeps only the most recent N messages, automatically discarding older ones.
Embedding: A numerical vector representation of text that captures its semantic meaning, enabling similarity-based search.
Tool Integration: The ability for an agent to use external functions or APIs while maintaining memory of both the conversation and stored knowledge.
Memory Management: Features that let users view, update, or delete stored information, giving them control over their data.
Quiz
Ready to test your understanding? Take this quick quiz to reinforce what you've learned about implementing memory in AI agents.






Comments