Long-Term Knowledge Storage and Retrieval: Building Persistent Memory for AI Agents
Back to Writing

Long-Term Knowledge Storage and Retrieval: Building Persistent Memory for AI Agents

Michael Brenndoerfer•November 9, 2025•11 min read•1,467 words•Interactive

Learn how AI agents store and retrieve information across sessions using vector databases, embeddings, and semantic search. Build a personal assistant that remembers facts, preferences, and knowledge long-term.

AI Agent Handbook Cover
Part of AI Agent Handbook

This article is part of the free-to-read AI Agent Handbook

View full handbook

Long-Term Knowledge Storage and Retrieval

In the previous subchapter, we gave our assistant the ability to remember recent conversations. This short-term memory works well for ongoing chats, but what happens when the conversation ends? What if you want your assistant to remember your birthday, your favorite restaurants, or important project details days or weeks later?

This is where long-term knowledge storage comes in. Think of it like the difference between remembering what someone just said versus writing important information in a notebook you can reference later. Our assistant needs both capabilities to be truly useful.

Why Long-Term Memory Matters

Imagine asking your assistant, "Remember that my birthday is July 20." A few days later, you ask, "When is my birthday?" Without long-term memory, the assistant has no way to recall this information. The conversation from days ago is gone.

Long-term memory solves this problem by storing important information persistently. Your assistant can save facts, preferences, and knowledge, then retrieve them when needed. This transforms your assistant from a helpful conversational tool into a personalized knowledge companion that truly knows you.

Let's see what this looks like in practice.

Storing Information for Later

The simplest form of long-term memory is a key-value store. You save information with a label (the key) and retrieve it later using that same label.

Here's a basic example using a Python dictionary that persists to a file:

1import json
2import os
3
4class SimpleMemory:
5    def __init__(self, memory_file="assistant_memory.json"):
6        self.memory_file = memory_file
7        self.memory = self._load_memory()
8    
9    def _load_memory(self):
10        """Load memory from file if it exists"""
11        if os.path.exists(self.memory_file):
12            with open(self.memory_file, 'r') as f:
13                return json.load(f)
14        return {}
15    
16    def _save_memory(self):
17        """Save memory to file"""
18        with open(self.memory_file, 'w') as f:
19            json.dump(self.memory, f, indent=2)
20    
21    def store(self, key, value):
22        """Store a fact in long-term memory"""
23        self.memory[key] = value
24        self._save_memory()
25        return f"Remembered: {key}"
26    
27    def retrieve(self, key):
28        """Retrieve a fact from long-term memory"""
29        return self.memory.get(key, "I don't have that information stored.")
30
31## Example usage
32memory = SimpleMemory()
33memory.store("user_birthday", "July 20")
34memory.store("favorite_restaurant", "Luigi's Italian Kitchen")
35
36## Later, even after restarting the program
37print(memory.retrieve("user_birthday"))  # Output: July 20

This works, but it has a limitation: you need to know the exact key to retrieve information. What if you want to ask, "What do you know about my food preferences?" The assistant would need to search through all stored information to find relevant facts.

Searching Your Knowledge Base

A more powerful approach is to store information in a way that allows searching by content, not just by exact keys. This is where the concept of a knowledge base comes in.

Let's expand our memory system to support searching:

1class SearchableMemory:
2    def __init__(self, memory_file="assistant_knowledge.json"):
3        self.memory_file = memory_file
4        self.facts = self._load_facts()
5    
6    def _load_facts(self):
7        """Load facts from file"""
8        if os.path.exists(self.memory_file):
9            with open(self.memory_file, 'r') as f:
10                return json.load(f)
11        return []
12    
13    def _save_facts(self):
14        """Save facts to file"""
15        with open(self.memory_file, 'w') as f:
16            json.dump(self.facts, f, indent=2)
17    
18    def add_fact(self, fact, category=None):
19        """Add a fact to the knowledge base"""
20        entry = {
21            "fact": fact,
22            "category": category,
23            "timestamp": str(datetime.now())
24        }
25        self.facts.append(entry)
26        self._save_facts()
27        return f"Stored: {fact}"
28    
29    def search(self, query):
30        """Search for facts containing the query text"""
31        results = []
32        query_lower = query.lower()
33        
34        for entry in self.facts:
35            # Simple keyword search
36            if query_lower in entry["fact"].lower():
37                results.append(entry["fact"])
38            elif entry["category"] and query_lower in entry["category"].lower():
39                results.append(entry["fact"])
40        
41        return results if results else ["No matching information found."]
42
43## Example usage
44from datetime import datetime
45
46kb = SearchableMemory()
47kb.add_fact("User's birthday is July 20", category="personal")
48kb.add_fact("User prefers Italian food", category="preferences")
49kb.add_fact("User is allergic to peanuts", category="preferences")
50
51## Search by keyword
52print(kb.search("birthday"))
53## Output: ["User's birthday is July 20"]
54
55print(kb.search("food"))
56## Output: ["User prefers Italian food"]
57
58print(kb.search("preferences"))
59## Output: ["User prefers Italian food", "User is allergic to peanuts"]

This is better. Now you can search for information without knowing the exact key. But there's still a problem: this only finds exact keyword matches. What if you ask, "What should I avoid eating?" The assistant needs to understand that this relates to allergies, even though the word "avoid" doesn't appear in the stored facts.

Understanding Meaning, Not Just Keywords

This is where vector stores and embeddings become useful. Instead of matching exact words, we can represent the meaning of text as numbers (vectors) and find information that's semantically similar.

Here's the concept: when you store a fact like "User is allergic to peanuts," the system converts this into a vector that represents its meaning. Later, when you ask "What should I avoid eating?", that question also gets converted to a vector. The system then finds stored facts whose vectors are close to the question's vector, meaning they're semantically related.

Let's see this in action using a simple example with sentence embeddings:

1from sentence_transformers import SentenceTransformer
2import numpy as np
3
4class VectorMemory:
5    def __init__(self):
6        # Using Claude Sonnet 4.5 for the agent, but a local model for embeddings
7        # to keep costs down for frequent similarity searches
8        self.encoder = SentenceTransformer('all-MiniLM-L6-v2')
9        self.facts = []
10        self.vectors = []
11    
12    def add_fact(self, fact):
13        """Add a fact and its vector representation"""
14        vector = self.encoder.encode(fact)
15        self.facts.append(fact)
16        self.vectors.append(vector)
17        return f"Stored: {fact}"
18    
19    def search(self, query, top_k=3):
20        """Find the most relevant facts for a query"""
21        if not self.facts:
22            return ["No information stored yet."]
23        
24        # Convert query to vector
25        query_vector = self.encoder.encode(query)
26        
27        # Calculate similarity with all stored facts
28        similarities = []
29        for i, fact_vector in enumerate(self.vectors):
30            # Cosine similarity
31            similarity = np.dot(query_vector, fact_vector) / (
32                np.linalg.norm(query_vector) * np.linalg.norm(fact_vector)
33            )
34            similarities.append((similarity, self.facts[i]))
35        
36        # Sort by similarity and return top results
37        similarities.sort(reverse=True, key=lambda x: x[0])
38        return [fact for score, fact in similarities[:top_k] if score > 0.3]
39
40## Example usage
41memory = VectorMemory()
42memory.add_fact("User's birthday is July 20")
43memory.add_fact("User prefers Italian food")
44memory.add_fact("User is allergic to peanuts")
45memory.add_fact("User lives in San Francisco")
46memory.add_fact("User works as a software engineer")
47
48## Semantic search
49print(memory.search("What should I avoid eating?"))
50## Output: ["User is allergic to peanuts", "User prefers Italian food"]
51
52print(memory.search("Where does the user live?"))
53## Output: ["User lives in San Francisco"]
54
55print(memory.search("What is the user's profession?"))
56## Output: ["User works as a software engineer"]

Notice how the search understands meaning. When you ask "What should I avoid eating?", it finds the allergy information even though the words don't match exactly. The vector representation captures that avoiding food relates to allergies and food preferences.

Integrating Long-Term Memory with Our Assistant

Now let's connect this to our personal assistant. We want the assistant to:

  1. Recognize when you're telling it something to remember
  2. Store that information in long-term memory
  3. Retrieve relevant information when answering questions
  4. Combine retrieved knowledge with its language model capabilities

Here's how this works with Claude Sonnet 4.5 (Example: Claude Sonnet 4.5):

1import anthropic
2import os
3
4class AssistantWithMemory:
5    def __init__(self):
6        self.client = anthropic.Anthropic(api_key=os.environ.get("ANTHROPIC_API_KEY"))
7        # Using Claude Sonnet 4.5 for its superior agent reasoning capabilities
8        self.model = "claude-sonnet-4.5"
9        self.memory = VectorMemory()
10        self.conversation_history = []
11    
12    def process_message(self, user_message):
13        """Process a user message, using memory when relevant"""
14        # First, check if this is a request to remember something
15        if self._is_memory_request(user_message):
16            return self._store_memory(user_message)
17        
18        # Search for relevant facts
19        relevant_facts = self.memory.search(user_message, top_k=3)
20        
21        # Build context with conversation history and retrieved facts
22        context = self._build_context(relevant_facts)
23        
24        # Get response from Claude
25        response = self.client.messages.create(
26            model=self.model,
27            max_tokens=1024,
28            system=context,
29            messages=self.conversation_history + [
30                {"role": "user", "content": user_message}
31            ]
32        )
33        
34        assistant_message = response.content[0].text
35        
36        # Update conversation history
37        self.conversation_history.append({"role": "user", "content": user_message})
38        self.conversation_history.append({"role": "assistant", "content": assistant_message})
39        
40        return assistant_message
41    
42    def _is_memory_request(self, message):
43        """Detect if user wants to store information"""
44        memory_keywords = ["remember", "store", "save", "keep in mind", "note that"]
45        return any(keyword in message.lower() for keyword in memory_keywords)
46    
47    def _store_memory(self, message):
48        """Extract and store information from user message"""
49        # Use Claude to extract the fact to remember
50        response = self.client.messages.create(
51            model=self.model,
52            max_tokens=256,
53            system="Extract the key fact the user wants you to remember. Return only the fact as a clear statement.",
54            messages=[{"role": "user", "content": message}]
55        )
56        
57        fact = response.content[0].text
58        self.memory.add_fact(fact)
59        return f"Got it! I'll remember that {fact.lower()}"
60    
61    def _build_context(self, relevant_facts):
62        """Build system context with retrieved facts"""
63        if not relevant_facts or relevant_facts == ["No information stored yet."]:
64            return "You are a helpful personal assistant."
65        
66        facts_text = "\n".join(f"- {fact}" for fact in relevant_facts)
67        return f"""You are a helpful personal assistant. You have access to the following information about the user:
68
69{facts_text}
70
71Use this information when relevant to provide personalized responses."""
72
73## Example conversation
74assistant = AssistantWithMemory()
75
76print(assistant.process_message("Remember that my birthday is July 20"))
77## Output: Got it! I'll remember that your birthday is july 20
78
79print(assistant.process_message("Remember that I'm allergic to peanuts"))
80## Output: Got it! I'll remember that you're allergic to peanuts
81
82print(assistant.process_message("What should I be careful about when eating out?"))
83## Output: Based on what I know, you should be careful about peanuts since you're 
84## allergic to them. When eating out, make sure to inform the restaurant staff about 
85## your peanut allergy and ask about ingredients in dishes...

Let's trace through what happens when you ask "What should I be careful about when eating out?":

  1. The assistant searches its vector memory for relevant facts
  2. It finds "You're allergic to peanuts" as highly relevant
  3. It includes this fact in the system context when calling Claude
  4. Claude uses this information to provide a personalized, helpful response

The assistant now has a persistent memory that survives across sessions. You can close the program, restart it days later, and it will still remember your allergy.

When to Use Long-Term Memory

Not everything needs to be stored in long-term memory. Here's a practical guide:

Store in long-term memory:

  • Personal facts (birthday, location, occupation)
  • Preferences (favorite foods, music, work style)
  • Important information (allergies, constraints, requirements)
  • Project details that span multiple sessions
  • Learned facts about recurring topics

Keep in short-term memory:

  • Current conversation context
  • Temporary working information
  • Details specific to this session only
  • Information that will become outdated quickly

You can even ask the assistant to decide what's worth remembering:

1def should_remember(self, information):
2    """Use Claude to decide if information is worth storing long-term"""
3    response = self.client.messages.create(
4        model=self.model,
5        max_tokens=128,
6        system="Decide if this information should be stored in long-term memory. Answer only 'yes' or 'no' with a brief reason.",
7        messages=[{"role": "user", "content": f"Should I remember this: {information}"}]
8    )
9    
10    decision = response.content[0].text.lower()
11    return "yes" in decision

Practical Considerations

As you build long-term memory into your assistant, keep these points in mind:

Storage limits: Vector databases can grow large. Consider setting limits on how many facts to store, or implementing a way to archive or remove old information.

Privacy: Long-term memory means persistent data. Be thoughtful about what you store and how you protect it. Never store sensitive information like passwords or financial details without proper encryption.

Retrieval quality: The quality of your retrieval depends on your embedding model. Better embeddings lead to better semantic search. The example above uses a small, fast model, but you might want a more powerful one for production use.

Context window limits: Even with long-term memory, you can only include so many retrieved facts in each request to the language model. Prioritize the most relevant information.

Updating facts: What happens when information changes? You might need a way to update or delete facts. For example, if the user moves to a new city, you want to update that fact rather than having two conflicting locations stored.

Here's a simple update mechanism:

1def update_fact(self, old_fact, new_fact):
2    """Replace an old fact with updated information"""
3    # Remove old fact
4    if old_fact in self.memory.facts:
5        idx = self.memory.facts.index(old_fact)
6        self.memory.facts.pop(idx)
7        self.memory.vectors.pop(idx)
8    
9    # Add new fact
10    self.memory.add_fact(new_fact)
11    return f"Updated: {new_fact}"

Combining Short-Term and Long-Term Memory

Your assistant now has both types of memory:

  • Short-term memory: Recent conversation history (from the previous subchapter)
  • Long-term memory: Persistent facts and knowledge (this subchapter)

The most effective assistants use both together. Short-term memory provides immediate context for the current conversation. Long-term memory provides background knowledge and personalization.

Here's how they work together:

1User: "I'm planning a dinner party next week"
2Assistant: [Uses short-term memory to track this conversation]
3         "That sounds great! What kind of cuisine are you thinking?"
4
5User: "Maybe Italian?"
6Assistant: [Retrieves from long-term memory: "User prefers Italian food"]
7         [Uses short-term memory: knows we're discussing a dinner party]
8         "Perfect choice! I know you love Italian food. Are you thinking 
9         of going to Luigi's Italian Kitchen, your favorite restaurant, 
10         or cooking at home?"
11
12User: "Cooking at home. Can you suggest a menu?"
13Assistant: [Retrieves from long-term memory: "User is allergic to peanuts"]
14         [Uses short-term memory: dinner party, Italian, cooking at home]
15         "I'll suggest a menu that avoids peanuts since you're allergic. 
16         How about starting with a Caprese salad, followed by homemade 
17         pasta with marinara sauce, and tiramisu for dessert?"

Notice how the assistant seamlessly blends information from both memory systems. It remembers the ongoing conversation (short-term) and applies personal knowledge (long-term) to provide helpful, customized suggestions.

Building Your Own Knowledge Base

You now understand the core concepts of long-term memory for AI agents. The examples above use simple implementations to illustrate the ideas. In practice, you might use specialized tools:

Vector databases like Pinecone, Weaviate, or Chroma provide optimized storage and retrieval for embeddings. They handle large-scale data better than our simple list-based approach.

Document stores like Elasticsearch or MongoDB work well when you need to store structured information with multiple fields and complex queries.

Graph databases like Neo4j excel when relationships between facts matter (for example, "Alice is Bob's manager" and "Bob works on Project X" implies "Alice oversees Project X").

The choice depends on your needs. For a personal assistant with a few hundred facts, the simple approach we've shown works fine. For a system serving thousands of users with millions of facts, you'll want more robust infrastructure.

What We've Built

Your assistant can now:

  1. Store facts persistently across sessions
  2. Search for information by meaning, not just keywords
  3. Retrieve relevant knowledge when answering questions
  4. Combine retrieved facts with language model capabilities
  5. Decide what information is worth remembering long-term

This transforms your assistant from a stateless question-answering system into a personalized knowledge companion. It knows you, remembers what matters, and uses that knowledge to provide better, more relevant help.

In the next chapter, we'll explore how to organize all these pieces into a coherent agent architecture, showing how memory, reasoning, and tools work together in a unified system.

Glossary

Embedding: A numerical representation (vector) of text that captures its semantic meaning. Similar meanings produce similar vectors, enabling semantic search.

Key-Value Store: A simple storage system where data is saved with a label (key) and retrieved using that same label. Like a dictionary or hash map.

Knowledge Base: A structured collection of information that an agent can search and retrieve from. More sophisticated than simple key-value storage.

Semantic Search: Finding information based on meaning rather than exact keyword matches. Uses embeddings to understand that "What should I avoid eating?" relates to allergy information.

Vector Database: A specialized database optimized for storing and searching embeddings. Enables fast similarity search across large collections of data.

Vector Store: Another term for vector database. A system that stores embeddings and supports similarity-based retrieval.

Cosine Similarity: A mathematical measure of how similar two vectors are, ranging from 0 (completely different) to 1 (identical). Used to find relevant information in vector search.

Quiz

Ready to test your understanding? Take this quick quiz to reinforce what you've learned about long-term knowledge storage and retrieval for AI agents.

Loading component...

Reference

BIBTEXAcademic
@misc{longtermknowledgestorageandretrievalbuildingpersistentmemoryforaiagents, author = {Michael Brenndoerfer}, title = {Long-Term Knowledge Storage and Retrieval: Building Persistent Memory for AI Agents}, year = {2025}, url = {https://mbrenndoerfer.com/writing/long-term-knowledge-storage-and-retrieval}, organization = {mbrenndoerfer.com}, note = {Accessed: 2025-11-10} }
APAAcademic
Michael Brenndoerfer (2025). Long-Term Knowledge Storage and Retrieval: Building Persistent Memory for AI Agents. Retrieved from https://mbrenndoerfer.com/writing/long-term-knowledge-storage-and-retrieval
MLAAcademic
Michael Brenndoerfer. "Long-Term Knowledge Storage and Retrieval: Building Persistent Memory for AI Agents." 2025. Web. 11/10/2025. <https://mbrenndoerfer.com/writing/long-term-knowledge-storage-and-retrieval>.
CHICAGOAcademic
Michael Brenndoerfer. "Long-Term Knowledge Storage and Retrieval: Building Persistent Memory for AI Agents." Accessed 11/10/2025. https://mbrenndoerfer.com/writing/long-term-knowledge-storage-and-retrieval.
HARVARDAcademic
Michael Brenndoerfer (2025) 'Long-Term Knowledge Storage and Retrieval: Building Persistent Memory for AI Agents'. Available at: https://mbrenndoerfer.com/writing/long-term-knowledge-storage-and-retrieval (Accessed: 11/10/2025).
SimpleBasic
Michael Brenndoerfer (2025). Long-Term Knowledge Storage and Retrieval: Building Persistent Memory for AI Agents. https://mbrenndoerfer.com/writing/long-term-knowledge-storage-and-retrieval
Michael Brenndoerfer

About the author: Michael Brenndoerfer

All opinions expressed here are my own and do not reflect the views of my employer.

Michael currently works as an Associate Director of Data Science at EQT Partners in Singapore, where he drives AI and data initiatives across private capital investments.

With over a decade of experience spanning private equity, management consulting, and software engineering, he specializes in building and scaling analytics capabilities from the ground up. He has published research in leading AI conferences and holds expertise in machine learning, natural language processing, and value creation through data.

Stay updated

Get notified when I publish new articles on data and AI, private equity, technology, and more.