Implementing Memory in Our Agent: Building a Complete Personal Assistant with Short-Term and Long-Term Memory

Michael BrenndoerferJuly 5, 202517 min read

Learn how to build a complete AI agent memory system combining conversation history and persistent knowledge storage. Includes semantic search, tool integration, and practical implementation patterns.

Reading Level

Toggle tooltip visibility. Hover over underlined terms for instant definitions.

Implementing Memory in Our Agent

You've learned the concepts: short-term memory for conversations and long-term memory for persistent knowledge. Now let's build a complete personal assistant that combines both. We'll start with a minimal implementation and gradually add sophistication, showing you exactly how memory works in practice.

By the end of this chapter, you'll have a working assistant that remembers conversations, stores important facts, and retrieves relevant information when needed. More importantly, you'll understand the design decisions and trade-offs involved in building memory systems for AI agents.

Starting Point: A Complete Memory System

Let's build our assistant step by step. We'll start with the core components and then assemble them into a working system.

The Conversation Manager

First, we need something to handle short-term memory. This manages the ongoing conversation:

In[3]:
Code
class ConversationManager:
    """
    Manages short-term conversation memory with automatic windowing.
    """
    
    def __init__(self, max_messages: int = 20):
        self.messages = []
        self.max_messages = max_messages
    
    def add_user_message(self, content: str):
        """Add a user message to history."""
        self.messages.append({
            "role": "user",
            "content": content
        })
        self._trim_if_needed()
    
    def add_assistant_message(self, content: str):
        """Add an assistant message to history."""
        self.messages.append({
            "role": "assistant",
            "content": content
        })
        self._trim_if_needed()
    
    def _trim_if_needed(self):
        """Keep only the most recent messages."""
        if len(self.messages) > self.max_messages:
            self.messages = self.messages[-self.max_messages:]
    
    def get_messages(self) -> list:
        """Get all current messages."""
        return self.messages.copy()
    
    def clear(self):
        """Start a fresh conversation."""
        self.messages = []

This is straightforward. We maintain a list of messages and automatically trim it when it gets too long. The sliding window approach keeps costs predictable and prevents context overflow.

The Knowledge Store

Next, we need long-term memory. This stores facts persistently:

In[4]:
Code
import json
import os
from datetime import datetime

class KnowledgeStore:
    """
    Stores and retrieves long-term facts about the user.
    """
    
    def __init__(self, storage_file: str = "knowledge.json"):
        self.storage_file = storage_file
        self.facts = self._load_facts()
    
    def _load_facts(self) -> list:
        """Load facts from disk."""
        if os.path.exists(self.storage_file):
            with open(self.storage_file, 'r') as f:
                return json.load(f)
        return []
    
    def _save_facts(self):
        """Save facts to disk."""
        with open(self.storage_file, 'w') as f:
            json.dump(self.facts, f, indent=2)
    
    def add_fact(self, fact: str, category: str = None):
        """Store a new fact."""
        entry = {
            "fact": fact,
            "category": category,
            "timestamp": datetime.now().isoformat()
        }
        self.facts.append(entry)
        self._save_facts()
    
    def search(self, query: str) -> list:
        """
        Search for relevant facts.
        Simple keyword matching for now.
        """
        query_lower = query.lower()
        results = []
        
        for entry in self.facts:
            fact_lower = entry["fact"].lower()
            if query_lower in fact_lower:
                results.append(entry["fact"])
            elif entry["category"] and query_lower in entry["category"].lower():
                results.append(entry["fact"])
        
        return results
    
    def get_all_facts(self) -> list:
        """Get all stored facts."""
        return [entry["fact"] for entry in self.facts]

This provides persistent storage with simple keyword search. We save facts to a JSON file so they survive between sessions. Each fact has a timestamp and optional category for organization.

Putting It Together

Now let's combine these into a complete assistant. Example (Claude Sonnet 4.5):

In[5]:
Code
import os
from anthropic import Anthropic

class PersonalAssistant:
    """
    A personal assistant with both short-term and long-term memory.
    """
    
    def __init__(self, api_key: str):
        # Using Claude Sonnet 4.5 for its excellent agent reasoning capabilities
        self.client = Anthropic(api_key=api_key)
        self.model = "claude-sonnet-4-5"
        
        # Memory systems
        self.conversation = ConversationManager(max_messages=20)
        self.knowledge = KnowledgeStore()
    
    def chat(self, user_message: str) -> str:
        """
        Process a user message and return a response.
        Handles both conversation and knowledge storage.
        """
        # Check if this is a memory command
        if self._is_memory_command(user_message):
            return self._handle_memory_command(user_message)
        
        # Search for relevant facts
        relevant_facts = self.knowledge.search(user_message)
        
        # Build system prompt with context
        system_prompt = self._build_system_prompt(relevant_facts)
        
        # Add user message to conversation
        self.conversation.add_user_message(user_message)
        
        # Get response from Claude
        response = self.client.messages.create(
            model=self.model,
            max_tokens=1024,
            system=system_prompt,
            messages=self.conversation.get_messages()
        )
        
        # Extract and store response
        assistant_message = response.content[0].text
        self.conversation.add_assistant_message(assistant_message)
        
        return assistant_message
    
    def _is_memory_command(self, message: str) -> bool:
        """Detect if user wants to store something."""
        keywords = ["remember", "store", "save", "keep in mind", "note that"]
        message_lower = message.lower()
        return any(keyword in message_lower for keyword in keywords)
    
    def _handle_memory_command(self, message: str) -> str:
        """Extract and store information."""
        # Use Claude to extract the fact
        response = self.client.messages.create(
            model=self.model,
            max_tokens=256,
            system="Extract the key fact the user wants you to remember. Return only the fact as a clear, concise statement.",
            messages=[{"role": "user", "content": message}]
        )
        
        fact = response.content[0].text
        self.knowledge.add_fact(fact)
        
        return f"Got it! I'll remember that {fact.lower()}"
    
    def _build_system_prompt(self, relevant_facts: list) -> str:
        """Build system prompt with retrieved knowledge."""
        base_prompt = "You are a helpful personal assistant."
        
        if not relevant_facts:
            return base_prompt
        
        facts_text = "\n".join(f"- {fact}" for fact in relevant_facts)
        return f"""{base_prompt}

You have access to the following information about the user:

{facts_text}

Use this information when relevant to provide personalized responses."""
    
    def start_new_conversation(self):
        """Clear conversation history but keep long-term knowledge."""
        self.conversation.clear()

Let's see this in action:

In[6]:
Code
## Create the assistant
assistant = PersonalAssistant(api_key=os.getenv("ANTHROPIC_API_KEY"))

## Store some facts
print(assistant.chat("Remember that my birthday is July 20"))
## Output: Got it! I'll remember that your birthday is july 20

print(assistant.chat("Remember that I'm allergic to peanuts"))
## Output: Got it! I'll remember that you're allergic to peanuts

print(assistant.chat("Remember that I prefer Italian food"))
## Output: Got it! I'll remember that you prefer italian food

## Now ask questions that use this knowledge
print(assistant.chat("What should I be careful about when eating out?"))
## Output: Based on what I know, you should be careful about peanuts since 
## you're allergic to them. Always inform restaurant staff about your allergy 
## and ask about ingredients, especially in sauces and desserts where peanuts 
## might be hidden...

print(assistant.chat("Suggest a restaurant for my birthday dinner"))
## Output: For your birthday on July 20, I'd suggest an Italian restaurant 
## since you prefer Italian food. Make sure to mention your peanut allergy 
## when making the reservation...
Out[6]:
Console
Got it! I'll remember that your birthday is july 20.
Got it! I'll remember that you are allergic to peanuts.
Got it! I'll remember that you prefer italian food.
When eating out, here are some key things to be careful about:

## Food Safety
- **Cleanliness**: Check if the restaurant looks clean (tables, floors, bathrooms)
- **Food temperature**: Hot foods should be hot, cold foods should be cold
- **Proper cooking**: Ensure meat, poultry, and seafood are cooked thoroughly
- **Fresh ingredients**: Be wary if food tastes or smells off

## Health Considerations
- **Allergies**: Always inform servers about food allergies
- **Portion sizes**: Restaurant servings are often 2-3x normal portions
- **Hidden calories**: Sauces, dressings, and cooking methods can add significant calories
- **Sodium content**: Restaurant food typically contains high sodium levels

## Financial
- **Menu prices**: Check prices before ordering; ask about specials
- **Additional charges**: Service fees, tips, drinks can add up quickly
- **Specials**: Sometimes "specials" aren't actually good deals

## Practical Tips
- **Cross-contamination**: Important if you have allergies or dietary restrictions
- **Buffets**: Food sitting out for long periods can be risky
- **Raw foods**: Be cautious with raw fish, eggs, or unpasteurized items
- **Tap water safety**: In unfamiliar locations, especially when traveling

Is there a specific concern you'd like me to elaborate on?
I'd love to help you find the perfect birthday restaurant! To give you the best suggestions, could you tell me:

1. **Location**: What city or neighborhood are you in?

2. **Cuisine preference**: Any particular type of food you're craving?
   - Italian, steakhouse, seafood, Asian fusion, Mexican, French, etc.

3. **Atmosphere**: What vibe are you looking for?
   - Romantic and intimate
   - Lively and fun
   - Upscale and elegant
   - Casual and relaxed

4. **Budget**: What price range works for you?
   - $ (casual)
   - $$ (moderate)
   - $$$ (upscale)
   - $$$$ (fine dining)

5. **Party size**: Just you and one other person, or a larger group?

6. **Special requirements**: Any dietary restrictions or preferences (vegetarian, gluten-free, etc.)?

Once I know more about what you're looking for, I can suggest some great options for your special day! 🎉

Notice how the assistant:

  1. Stores facts when you use memory keywords
  2. Retrieves relevant facts when answering questions
  3. Combines stored knowledge with its language model capabilities
  4. Maintains conversation context across multiple turns

This is a complete, working memory system. But we can make it better.

The keyword search works, but it misses semantically related information. If you ask "What foods should I avoid?", it won't find your peanut allergy unless the word "avoid" appears in the stored fact.

Let's upgrade to semantic search using embeddings:

In[7]:
Code
from sentence_transformers import SentenceTransformer
import numpy as np

class SemanticKnowledgeStore:
    """
    Knowledge store with semantic search using embeddings.
    """
    
    def __init__(self, storage_file: str = "knowledge.json"):
        self.storage_file = storage_file
        # Using a local embedding model to keep costs down
        self.encoder = SentenceTransformer('all-MiniLM-L6-v2')
        self.facts = []
        self.embeddings = []
        self._load_data()
    
    def _load_data(self):
        """Load facts and regenerate embeddings."""
        if os.path.exists(self.storage_file):
            with open(self.storage_file, 'r') as f:
                data = json.load(f)
                self.facts = data
                # Regenerate embeddings
                for entry in self.facts:
                    embedding = self.encoder.encode(entry["fact"])
                    self.embeddings.append(embedding)
    
    def _save_data(self):
        """Save facts to disk."""
        with open(self.storage_file, 'w') as f:
            json.dump(self.facts, f, indent=2)
    
    def add_fact(self, fact: str, category: str = None):
        """Store a fact with its embedding."""
        entry = {
            "fact": fact,
            "category": category,
            "timestamp": datetime.now().isoformat()
        }
        
        # Generate embedding
        embedding = self.encoder.encode(fact)
        
        self.facts.append(entry)
        self.embeddings.append(embedding)
        self._save_data()
    
    def search(self, query: str, top_k: int = 3, threshold: float = 0.3) -> list:
        """
        Search for facts semantically similar to the query.
        
        Args:
            query: The search query
            top_k: Maximum number of results to return
            threshold: Minimum similarity score (0-1)
        
        Returns:
            List of relevant facts
        """
        if not self.facts:
            return []
        
        # Encode query
        query_embedding = self.encoder.encode(query)
        
        # Calculate similarities
        similarities = []
        for i, fact_embedding in enumerate(self.embeddings):
            # Cosine similarity
            similarity = np.dot(query_embedding, fact_embedding) / (
                np.linalg.norm(query_embedding) * np.linalg.norm(fact_embedding)
            )
            similarities.append((similarity, self.facts[i]["fact"]))
        
        # Sort by similarity and filter by threshold
        similarities.sort(reverse=True, key=lambda x: x[0])
        results = [fact for score, fact in similarities[:top_k] if score >= threshold]
        
        return results

Now replace KnowledgeStore with SemanticKnowledgeStore in the PersonalAssistant class:

In[14]:
Code
class PersonalAssistant:
    def __init__(self, api_key: str):
        self.client = Anthropic(api_key=api_key)
        self.model = "claude-sonnet-4-5"
        self.conversation = ConversationManager(max_messages=20)
        # Use semantic search instead
        self.knowledge = SemanticKnowledgeStore()
    
    # ... rest of the methods stay the same

Let's test the improved search:

In[16]:
Code
assistant = PersonalAssistant(api_key=os.getenv("ANTHROPIC_API_KEY"))

## Store facts
assistant.chat("Remember that I'm allergic to peanuts")
assistant.chat("Remember that I prefer Italian food")
assistant.chat("Remember that I live in San Francisco")

## Semantic queries that don't match keywords exactly
print(assistant.chat("What foods should I avoid?"))
## Finds: "You're allergic to peanuts"
## Output: You should avoid peanuts and any foods containing peanuts or 
## peanut oil, as you're allergic to them...

print(assistant.chat("Where am I located?"))
## Finds: "You live in San Francisco"
## Output: You're located in San Francisco.

print(assistant.chat("What cuisine do I enjoy?"))
## Finds: "You prefer Italian food"
## Output: You enjoy Italian cuisine.

The semantic search understands meaning, not just keywords. "What foods should I avoid?" finds your allergy information even though "avoid" doesn't appear in the stored fact.

Adding Tool Use with Memory

Let's extend our assistant to use tools while maintaining memory. We'll add a calculator tool as an example:

In[18]:
Code
def calculate(expression: str) -> dict:
    """Calculator tool for mathematical operations."""
    try:
        result = eval(expression)
        return {"success": True, "result": result}
    except Exception as e:
        return {"success": False, "error": str(e)}

calculator_tool = {
    "name": "calculate",
    "description": "Perform mathematical calculations. Input should be a valid Python expression.",
    "input_schema": {
        "type": "object",
        "properties": {
            "expression": {
                "type": "string",
                "description": "Mathematical expression to evaluate (e.g., '2 + 2', '15 * 3.5')"
            }
        },
        "required": ["expression"]
    }
}

class PersonalAssistantWithTools:
    """Assistant with memory and tool use."""
    
    def __init__(self, api_key: str):
        self.client = Anthropic(api_key=api_key)
        self.model = "claude-sonnet-4-5"
        self.conversation = ConversationManager(max_messages=20)
        self.knowledge = SemanticKnowledgeStore()
        self.tools = {"calculate": calculate}
    
    def chat(self, user_message: str) -> str:
        """Process message with tool use support."""
        # Handle memory commands
        if self._is_memory_command(user_message):
            return self._handle_memory_command(user_message)
        
        # Search for relevant facts
        relevant_facts = self.knowledge.search(user_message)
        system_prompt = self._build_system_prompt(relevant_facts)
        
        # Add user message
        self.conversation.add_user_message(user_message)
        
        # Get response with tools
        response = self.client.messages.create(
            model=self.model,
            max_tokens=1024,
            system=system_prompt,
            tools=[calculator_tool],
            messages=self.conversation.get_messages()
        )
        
        # Handle tool use if needed
        while response.stop_reason == "tool_use":
            # Extract tool call
            tool_use_block = next(
                block for block in response.content 
                if block.type == "tool_use"
            )
            
            # Execute tool
            tool_result = self.tools[tool_use_block.name](
                **tool_use_block.input
            )
            
            # Add tool use to conversation
            self.conversation.add_assistant_message(response.content)
            
            # Add tool result
            self.conversation.messages.append({
                "role": "user",
                "content": [{
                    "type": "tool_result",
                    "tool_use_id": tool_use_block.id,
                    "content": str(tool_result)
                }]
            })
            
            # Get next response
            response = self.client.messages.create(
                model=self.model,
                max_tokens=1024,
                system=system_prompt,
                tools=[calculator_tool],
                messages=self.conversation.get_messages()
            )
        
        # Extract final response
        assistant_message = response.content[0].text
        self.conversation.add_assistant_message(assistant_message)
        
        return assistant_message
    
    # ... other methods same as before

Now watch memory and tools work together:

In[20]:
Code
assistant = PersonalAssistantWithTools(api_key=os.getenv("ANTHROPIC_API_KEY"))

## Store a fact
print(assistant.chat("Remember that I need to save \$500 per month"))
## Output: Got it! I'll remember that you need to save $500 per month

## Use tools with memory
print(assistant.chat("I earn \$3000 per month. After my savings goal, how much do I have left?"))
## Agent retrieves: "You need to save $500 per month"
## Agent uses calculator: 3000 - 500
## Output: After setting aside your $500 monthly savings goal, you'll have 
## $2,500 left for other expenses.

print(assistant.chat("If I split the remaining amount across 4 weeks, how much is that per week?"))
## Agent remembers previous calculation: $2,500
## Agent uses calculator: 2500 / 4
## Output: Splitting your remaining $2,500 across 4 weeks gives you $625 per week.

The assistant combines three capabilities:

  1. Long-term memory: Retrieves your savings goal
  2. Short-term memory: Remembers the previous calculation
  3. Tool use: Performs accurate calculations

This is powerful. The agent can reference stored facts, maintain conversation context, and use tools to solve problems it couldn't handle with language alone.

Managing Memory Over Time

As your assistant accumulates facts, you'll need ways to manage them. Let's add some utilities:

In[22]:
Code
class PersonalAssistantWithManagement(PersonalAssistantWithTools):
    """Assistant with memory management capabilities."""
    
    def list_facts(self) -> str:
        """Show all stored facts."""
        facts = self.knowledge.get_all_facts()
        
        if not facts:
            return "I don't have any facts stored yet."
        
        facts_list = "\n".join(f"{i+1}. {fact}" for i, fact in enumerate(facts))
        return f"Here's what I know about you:\n\n{facts_list}"
    
    def forget_fact(self, fact_number: int) -> str:
        """Remove a specific fact."""
        facts = self.knowledge.facts
        
        if fact_number < 1 or fact_number > len(facts):
            return f"Invalid fact number. I have {len(facts)} facts stored."
        
        removed_fact = facts[fact_number - 1]["fact"]
        
        # Remove from both lists
        facts.pop(fact_number - 1)
        self.knowledge.embeddings.pop(fact_number - 1)
        self.knowledge._save_data()
        
        return f"Forgot: {removed_fact}"
    
    def update_fact(self, fact_number: int, new_fact: str) -> str:
        """Update an existing fact."""
        facts = self.knowledge.facts
        
        if fact_number < 1 or fact_number > len(facts):
            return f"Invalid fact number. I have {len(facts)} facts stored."
        
        old_fact = facts[fact_number - 1]["fact"]
        
        # Update fact and embedding
        facts[fact_number - 1]["fact"] = new_fact
        facts[fact_number - 1]["timestamp"] = datetime.now().isoformat()
        self.knowledge.embeddings[fact_number - 1] = self.knowledge.encoder.encode(new_fact)
        self.knowledge._save_data()
        
        return f"Updated: '{old_fact}' → '{new_fact}'"

Now you can manage stored knowledge:

In[24]:
Code
assistant = PersonalAssistantWithManagement(api_key=os.getenv("ANTHROPIC_API_KEY"))

## Store some facts
assistant.chat("Remember that I live in San Francisco")
assistant.chat("Remember that I'm allergic to peanuts")
assistant.chat("Remember that I prefer Italian food")

## List all facts
print(assistant.list_facts())
## Output:
## Here's what I know about you:
## 
## 1. You live in San Francisco
## 2. You're allergic to peanuts
## 3. You prefer Italian food

## Update a fact
print(assistant.update_fact(1, "You live in Oakland"))
## Output: Updated: 'You live in San Francisco' → 'You live in Oakland'

## Remove a fact
print(assistant.forget_fact(2))
## Output: Forgot: You're allergic to peanuts

These management functions give users control over their data. This is important for privacy and accuracy.

Design Decisions and Trade-offs

Let's discuss the choices we made and their implications:

Sliding Window for Conversations

We limit conversation history to 20 messages. This:

Pros:

  • Keeps costs predictable
  • Prevents context overflow
  • Simple to implement

Cons:

  • Loses older conversation context
  • Might forget important details from earlier in the session

Alternative: Use summarization instead of hard truncation. Periodically summarize old messages and keep the summary.

Keyword Detection for Memory Commands

We detect "remember", "store", etc. to trigger memory storage. This:

Pros:

  • Simple and fast
  • No extra API calls
  • User has explicit control

Cons:

  • Might miss implicit memory requests
  • Requires specific keywords

Alternative: Use the LLM to classify every message as "store this" or "just chat". More flexible but costs more.

Semantic Search with Local Embeddings

We use a local embedding model for semantic search. This:

Pros:

  • Fast and free
  • Good enough for most use cases
  • No API calls for every search

Cons:

  • Less powerful than API-based embeddings
  • Requires installing additional libraries

Alternative: Use OpenAI's embedding API or Anthropic's future embedding service. Better quality but costs money.

JSON File Storage

We store facts in a JSON file. This:

Pros:

  • Simple to implement
  • No database setup required
  • Easy to inspect and debug

Cons:

  • Doesn't scale to thousands of facts
  • No concurrent access support
  • Limited query capabilities

Alternative: Use a proper database (SQLite for local, PostgreSQL for production) or a vector database (Pinecone, Weaviate, Chroma).

When to Use What

Here's a practical guide for choosing memory strategies:

For personal projects or prototypes:

  • Use the simple JSON-based approach we've shown
  • Sliding window for conversation history
  • Local embeddings for semantic search

For production applications with <1000 users:

  • SQLite or PostgreSQL for knowledge storage
  • Consider Redis for conversation history (fast, ephemeral)
  • Still use local embeddings if cost is a concern

For large-scale applications:

  • Vector database (Pinecone, Weaviate, Chroma) for knowledge
  • Redis or similar for conversation state
  • API-based embeddings for best quality
  • Implement proper user isolation and security

Testing Your Memory System

How do you know if your memory system works well? Here are some tests:

In[26]:
Code
def test_memory_system():
    """Test that memory works correctly."""
    assistant = PersonalAssistant(api_key=os.getenv("ANTHROPIC_API_KEY"))
    
    # Test 1: Store and retrieve
    assistant.chat("Remember that my birthday is July 20")
    response = assistant.chat("When is my birthday?")
    assert "july 20" in response.lower(), "Failed to retrieve birthday"
    
    # Test 2: Semantic search
    assistant.chat("Remember that I'm allergic to peanuts")
    response = assistant.chat("What should I avoid eating?")
    assert "peanut" in response.lower(), "Failed semantic search"
    
    # Test 3: Conversation context
    assistant.chat("I'm planning a trip to Japan")
    response = assistant.chat("What's the best time to visit?")
    assert "japan" in response.lower(), "Lost conversation context"
    
    # Test 4: Persistence
    assistant2 = PersonalAssistant(api_key=os.getenv("ANTHROPIC_API_KEY"))
    response = assistant2.chat("When is my birthday?")
    assert "july 20" in response.lower(), "Facts not persisted"
    
    print("All tests passed!")

test_memory_system()

These tests verify:

  • Facts are stored and retrieved correctly
  • Semantic search finds relevant information
  • Conversation context is maintained
  • Data persists across sessions

What You've Built

Let's appreciate what you now have:

A complete memory system with both short-term and long-term components working together.

Semantic search that understands meaning, not just keywords.

Tool integration where memory enhances tool use.

Memory management giving users control over their data.

Persistent storage that survives between sessions.

This is a real, working personal assistant. You can extend it with more tools, better storage, or additional capabilities. The foundation is solid.

Practical Considerations

As you deploy your memory system, keep these points in mind:

Privacy matters: You're storing personal information. Encrypt sensitive data, provide ways to export or delete it, and be transparent about what you store.

Memory can be wrong: Users might tell you incorrect information or change their minds. Provide ways to correct or update facts.

Not everything should be remembered: Some information is temporary or sensitive. Consider what truly needs long-term storage.

Test with real users: Your assumptions about what to remember might differ from what users actually need. Gather feedback.

Monitor costs: If using API-based embeddings or large context windows, track your spending. Optimize where needed.

Key Takeaways

Let's review what we've learned about implementing memory:

Memory has two layers: Short-term for conversations, long-term for persistent facts. Both are essential.

Start simple: A list for conversations and a JSON file for facts works fine for many applications.

Semantic search is powerful: Embeddings let you find information by meaning, making retrieval much more useful.

Memory enhances everything: When combined with tools and reasoning, memory makes your agent far more capable.

Design for users: Provide ways to view, update, and delete stored information. Users should control their data.

Your personal assistant now has a complete memory system. It remembers conversations, stores important facts, retrieves relevant information, and combines all of this with language model capabilities and tool use. This is a significant milestone in building truly useful AI agents.

In the next chapter, we'll explore how to organize all these components into a coherent agent architecture, showing how memory, reasoning, tools, and state management work together in a unified system.

Glossary

Conversation Manager: A component that handles short-term memory by storing and managing the recent message history in a conversation.

Knowledge Store: A persistent storage system for long-term facts and information that survives across sessions.

Semantic Search: Finding information based on meaning rather than exact keyword matches, typically using embeddings and similarity calculations.

Sliding Window: A memory management strategy that keeps only the most recent N messages, automatically discarding older ones.

Embedding: A numerical vector representation of text that captures its semantic meaning, enabling similarity-based search.

Tool Integration: The ability for an agent to use external functions or APIs while maintaining memory of both the conversation and stored knowledge.

Memory Management: Features that let users view, update, or delete stored information, giving them control over their data.

Quiz

Ready to test your understanding? Take this quick quiz to reinforce what you've learned about implementing memory in AI agents.

Loading component...

Reference

BIBTEXAcademic
@misc{implementingmemoryinouragentbuildingacompletepersonalassistantwithshorttermandlongtermmemory, author = {Michael Brenndoerfer}, title = {Implementing Memory in Our Agent: Building a Complete Personal Assistant with Short-Term and Long-Term Memory}, year = {2025}, url = {https://mbrenndoerfer.com/writing/implementing-memory-in-ai-agents}, organization = {mbrenndoerfer.com}, note = {Accessed: 2025-12-25} }
APAAcademic
Michael Brenndoerfer (2025). Implementing Memory in Our Agent: Building a Complete Personal Assistant with Short-Term and Long-Term Memory. Retrieved from https://mbrenndoerfer.com/writing/implementing-memory-in-ai-agents
MLAAcademic
Michael Brenndoerfer. "Implementing Memory in Our Agent: Building a Complete Personal Assistant with Short-Term and Long-Term Memory." 2025. Web. 12/25/2025. <https://mbrenndoerfer.com/writing/implementing-memory-in-ai-agents>.
CHICAGOAcademic
Michael Brenndoerfer. "Implementing Memory in Our Agent: Building a Complete Personal Assistant with Short-Term and Long-Term Memory." Accessed 12/25/2025. https://mbrenndoerfer.com/writing/implementing-memory-in-ai-agents.
HARVARDAcademic
Michael Brenndoerfer (2025) 'Implementing Memory in Our Agent: Building a Complete Personal Assistant with Short-Term and Long-Term Memory'. Available at: https://mbrenndoerfer.com/writing/implementing-memory-in-ai-agents (Accessed: 12/25/2025).
SimpleBasic
Michael Brenndoerfer (2025). Implementing Memory in Our Agent: Building a Complete Personal Assistant with Short-Term and Long-Term Memory. https://mbrenndoerfer.com/writing/implementing-memory-in-ai-agents