Learn how AI agents store and retrieve information across sessions using vector databases, embeddings, and semantic search. Build a personal assistant that remembers facts, preferences, and knowledge long-term.

This article is part of the free-to-read AI Agent Handbook
Toggle tooltip visibility. Hover over underlined terms for instant definitions.
Long-Term Knowledge Storage and Retrieval
In the previous subchapter, we gave our assistant the ability to remember recent conversations. This short-term memory works well for ongoing chats, but what happens when the conversation ends? What if you want your assistant to remember your birthday, your favorite restaurants, or important project details days or weeks later?
This is where long-term knowledge storage comes in. Think of it like the difference between remembering what someone just said versus writing important information in a notebook you can reference later. Our assistant needs both capabilities to be truly useful.
Why Long-Term Memory Matters
Imagine asking your assistant, "Remember that my birthday is July 20." A few days later, you ask, "When is my birthday?" Without long-term memory, the assistant has no way to recall this information. The conversation from days ago is gone.
Long-term memory solves this problem by storing important information persistently. Your assistant can save facts, preferences, and knowledge, then retrieve them when needed. This transforms your assistant from a helpful conversational tool into a personalized knowledge companion that truly knows you.
Let's see what this looks like in practice.
Storing Information for Later
The simplest form of long-term memory is a key-value store. You save information with a label (the key) and retrieve it later using that same label.
Here's a basic example using a Python dictionary that persists to a file:
This works, but it has a limitation: you need to know the exact key to retrieve information. What if you want to ask, "What do you know about my food preferences?" The assistant would need to search through all stored information to find relevant facts.
Searching Your Knowledge Base
A more powerful approach is to store information in a way that allows searching by content, not just by exact keys. This is where the concept of a knowledge base comes in.
Let's expand our memory system to support searching:
This is better. Now you can search for information without knowing the exact key. But there's still a problem: this only finds exact keyword matches. What if you ask, "What should I avoid eating?" The assistant needs to understand that this relates to allergies, even though the word "avoid" doesn't appear in the stored facts.
Understanding Meaning, Not Just Keywords
This is where vector stores and embeddings become useful. Instead of matching exact words, we can represent the meaning of text as numbers (vectors) and find information that's semantically similar.
Here's the concept: when you store a fact like "User is allergic to peanuts," the system converts this into a vector that represents its meaning. Later, when you ask "What should I avoid eating?", that question also gets converted to a vector. The system then finds stored facts whose vectors are close to the question's vector, meaning they're semantically related.
Let's see this in action using a simple example with sentence embeddings:
Notice how the search understands meaning. When you ask "What should I avoid eating?", it finds the allergy information even though the words don't match exactly. The vector representation captures that avoiding food relates to allergies and food preferences.
Integrating Long-Term Memory with Our Assistant
Now let's connect this to our personal assistant. We want the assistant to:
- Recognize when you're telling it something to remember
- Store that information in long-term memory
- Retrieve relevant information when answering questions
- Combine retrieved knowledge with its language model capabilities
Here's how this works with Claude Sonnet 4.5 (Example: Claude Sonnet 4.5):
Let's trace through what happens when you ask "What should I be careful about when eating out?":
- The assistant searches its vector memory for relevant facts
- It finds "You're allergic to peanuts" as highly relevant
- It includes this fact in the system context when calling Claude
- Claude uses this information to provide a personalized, helpful response
The assistant now has a persistent memory that survives across sessions. You can close the program, restart it days later, and it will still remember your allergy.
When to Use Long-Term Memory
Not everything needs to be stored in long-term memory. Here's a practical guide:
Store in long-term memory:
- Personal facts (birthday, location, occupation)
- Preferences (favorite foods, music, work style)
- Important information (allergies, constraints, requirements)
- Project details that span multiple sessions
- Learned facts about recurring topics
Keep in short-term memory:
- Current conversation context
- Temporary working information
- Details specific to this session only
- Information that will become outdated quickly
You can even ask the assistant to decide what's worth remembering:
Practical Considerations
As you build long-term memory into your assistant, keep these points in mind:
Storage limits: Vector databases can grow large. Consider setting limits on how many facts to store, or implementing a way to archive or remove old information.
Privacy: Long-term memory means persistent data. Be thoughtful about what you store and how you protect it. Never store sensitive information like passwords or financial details without proper encryption.
Retrieval quality: The quality of your retrieval depends on your embedding model. Better embeddings lead to better semantic search. The example above uses a small, fast model, but you might want a more powerful one for production use.
Context window limits: Even with long-term memory, you can only include so many retrieved facts in each request to the language model. Prioritize the most relevant information.
Updating facts: What happens when information changes? You might need a way to update or delete facts. For example, if the user moves to a new city, you want to update that fact rather than having two conflicting locations stored.
Here's a simple update mechanism:
Combining Short-Term and Long-Term Memory
Your assistant now has both types of memory:
- Short-term memory: Recent conversation history (from the previous subchapter)
- Long-term memory: Persistent facts and knowledge (this subchapter)
The most effective assistants use both together. Short-term memory provides immediate context for the current conversation. Long-term memory provides background knowledge and personalization.
Here's how they work together:
Notice how the assistant seamlessly blends information from both memory systems. It remembers the ongoing conversation (short-term) and applies personal knowledge (long-term) to provide helpful, customized suggestions.
Building Your Own Knowledge Base
You now understand the core concepts of long-term memory for AI agents. The examples above use simple implementations to illustrate the ideas. In practice, you might use specialized tools:
Vector databases like Pinecone, Weaviate, or Chroma provide optimized storage and retrieval for embeddings. They handle large-scale data better than our simple list-based approach.
Document stores like Elasticsearch or MongoDB work well when you need to store structured information with multiple fields and complex queries.
Graph databases like Neo4j excel when relationships between facts matter (for example, "Alice is Bob's manager" and "Bob works on Project X" implies "Alice oversees Project X").
The choice depends on your needs. For a personal assistant with a few hundred facts, the simple approach we've shown works fine. For a system serving thousands of users with millions of facts, you'll want more robust infrastructure.
What We've Built
Your assistant can now:
- Store facts persistently across sessions
- Search for information by meaning, not just keywords
- Retrieve relevant knowledge when answering questions
- Combine retrieved facts with language model capabilities
- Decide what information is worth remembering long-term
This transforms your assistant from a stateless question-answering system into a personalized knowledge companion. It knows you, remembers what matters, and uses that knowledge to provide better, more relevant help.
In the next chapter, we'll explore how to organize all these pieces into a coherent agent architecture, showing how memory, reasoning, and tools work together in a unified system.
Glossary
Embedding: A numerical representation (vector) of text that captures its semantic meaning. Similar meanings produce similar vectors, enabling semantic search.
Key-Value Store: A simple storage system where data is saved with a label (key) and retrieved using that same label. Like a dictionary or hash map.
Knowledge Base: A structured collection of information that an agent can search and retrieve from. More sophisticated than simple key-value storage.
Semantic Search: Finding information based on meaning rather than exact keyword matches. Uses embeddings to understand that "What should I avoid eating?" relates to allergy information.
Vector Database: A specialized database optimized for storing and searching embeddings. Enables fast similarity search across large collections of data.
Vector Store: Another term for vector database. A system that stores embeddings and supports similarity-based retrieval.
Cosine Similarity: A mathematical measure of how similar two vectors are, ranging from 0 (completely different) to 1 (identical). Used to find relevant information in vector search.
Quiz
Ready to test your understanding? Take this quick quiz to reinforce what you've learned about long-term knowledge storage and retrieval for AI agents.





Comments