Implementing Memory in Our Agent: Building a Complete Personal Assistant with Short-Term and Long-Term Memory

Michael Brenndoerfer

AI Agent Handbook Machine Learning Software Engineering

Learn how to build a complete AI agent memory system combining conversation history and persistent knowledge storage. Includes semantic search, tool integration, and practical implementation patterns.

Part of AI Agent Handbook

This article is part of the free-to-read AI Agent Handbook

View full handbook

Implementing Memory in Our Agent

You've learned the concepts: short-term memory for conversations and long-term memory for persistent knowledge. Now let's build a complete personal assistant that combines both. We'll start with a minimal implementation and gradually add sophistication, showing you exactly how memory works in practice.

By the end of this chapter, you'll have a working assistant that remembers conversations, stores important facts, and retrieves relevant information when needed. More importantly, you'll understand the design decisions and trade-offs involved in building memory systems for AI agents.

Starting Point: A Complete Memory System

Let's build our assistant step by step. We'll start with the core components and then assemble them into a working system.

The Conversation Manager

First, we need something to handle short-term memory. This manages the ongoing conversation:

1class ConversationManager:
2    """
3    Manages short-term conversation memory with automatic windowing.
4    """
5    
6    def __init__(self, max_messages: int = 20):
7        self.messages = []
8        self.max_messages = max_messages
9    
10    def add_user_message(self, content: str):
11        """Add a user message to history."""
12        self.messages.append({
13            "role": "user",
14            "content": content
15        })
16        self._trim_if_needed()
17    
18    def add_assistant_message(self, content: str):
19        """Add an assistant message to history."""
20        self.messages.append({
21            "role": "assistant",
22            "content": content
23        })
24        self._trim_if_needed()
25    
26    def _trim_if_needed(self):
27        """Keep only the most recent messages."""
28        if len(self.messages) > self.max_messages:
29            self.messages = self.messages[-self.max_messages:]
30    
31    def get_messages(self) -> list:
32        """Get all current messages."""
33        return self.messages.copy()
34    
35    def clear(self):
36        """Start a fresh conversation."""
37        self.messages = []

1class ConversationManager:
2    """
3    Manages short-term conversation memory with automatic windowing.
4    """
5    
6    def __init__(self, max_messages: int = 20):
7        self.messages = []
8        self.max_messages = max_messages
9    
10    def add_user_message(self, content: str):
11        """Add a user message to history."""
12        self.messages.append({
13            "role": "user",
14            "content": content
15        })
16        self._trim_if_needed()
17    
18    def add_assistant_message(self, content: str):
19        """Add an assistant message to history."""
20        self.messages.append({
21            "role": "assistant",
22            "content": content
23        })
24        self._trim_if_needed()
25    
26    def _trim_if_needed(self):
27        """Keep only the most recent messages."""
28        if len(self.messages) > self.max_messages:
29            self.messages = self.messages[-self.max_messages:]
30    
31    def get_messages(self) -> list:
32        """Get all current messages."""
33        return self.messages.copy()
34    
35    def clear(self):
36        """Start a fresh conversation."""
37        self.messages = []

This is straightforward. We maintain a list of messages and automatically trim it when it gets too long. The sliding window approach keeps costs predictable and prevents context overflow.

The Knowledge Store

Next, we need long-term memory. This stores facts persistently:

1import json
2import os
3from datetime import datetime
4
5class KnowledgeStore:
6    """
7    Stores and retrieves long-term facts about the user.
8    """
9    
10    def __init__(self, storage_file: str = "knowledge.json"):
11        self.storage_file = storage_file
12        self.facts = self._load_facts()
13    
14    def _load_facts(self) -> list:
15        """Load facts from disk."""
16        if os.path.exists(self.storage_file):
17            with open(self.storage_file, 'r') as f:
18                return json.load(f)
19        return []
20    
21    def _save_facts(self):
22        """Save facts to disk."""
23        with open(self.storage_file, 'w') as f:
24            json.dump(self.facts, f, indent=2)
25    
26    def add_fact(self, fact: str, category: str = None):
27        """Store a new fact."""
28        entry = {
29            "fact": fact,
30            "category": category,
31            "timestamp": datetime.now().isoformat()
32        }
33        self.facts.append(entry)
34        self._save_facts()
35    
36    def search(self, query: str) -> list:
37        """
38        Search for relevant facts.
39        Simple keyword matching for now.
40        """
41        query_lower = query.lower()
42        results = []
43        
44        for entry in self.facts:
45            fact_lower = entry["fact"].lower()
46            if query_lower in fact_lower:
47                results.append(entry["fact"])
48            elif entry["category"] and query_lower in entry["category"].lower():
49                results.append(entry["fact"])
50        
51        return results
52    
53    def get_all_facts(self) -> list:
54        """Get all stored facts."""
55        return [entry["fact"] for entry in self.facts]

1import json
2import os
3from datetime import datetime
4
5class KnowledgeStore:
6    """
7    Stores and retrieves long-term facts about the user.
8    """
9    
10    def __init__(self, storage_file: str = "knowledge.json"):
11        self.storage_file = storage_file
12        self.facts = self._load_facts()
13    
14    def _load_facts(self) -> list:
15        """Load facts from disk."""
16        if os.path.exists(self.storage_file):
17            with open(self.storage_file, 'r') as f:
18                return json.load(f)
19        return []
20    
21    def _save_facts(self):
22        """Save facts to disk."""
23        with open(self.storage_file, 'w') as f:
24            json.dump(self.facts, f, indent=2)
25    
26    def add_fact(self, fact: str, category: str = None):
27        """Store a new fact."""
28        entry = {
29            "fact": fact,
30            "category": category,
31            "timestamp": datetime.now().isoformat()
32        }
33        self.facts.append(entry)
34        self._save_facts()
35    
36    def search(self, query: str) -> list:
37        """
38        Search for relevant facts.
39        Simple keyword matching for now.
40        """
41        query_lower = query.lower()
42        results = []
43        
44        for entry in self.facts:
45            fact_lower = entry["fact"].lower()
46            if query_lower in fact_lower:
47                results.append(entry["fact"])
48            elif entry["category"] and query_lower in entry["category"].lower():
49                results.append(entry["fact"])
50        
51        return results
52    
53    def get_all_facts(self) -> list:
54        """Get all stored facts."""
55        return [entry["fact"] for entry in self.facts]

This provides persistent storage with simple keyword search. We save facts to a JSON file so they survive between sessions. Each fact has a timestamp and optional category for organization.

Putting It Together

Now let's combine these into a complete assistant. Example (Claude Sonnet 4.5):

1import anthropic
2
3class PersonalAssistant:
4    """
5    A personal assistant with both short-term and long-term memory.
6    """
7    
8    def __init__(self, api_key: str):
9        # Using Claude Sonnet 4.5 for its excellent agent reasoning capabilities
10        self.client = anthropic.Anthropic(api_key=api_key)
11        self.model = "claude-sonnet-4.5"
12        
13        # Memory systems
14        self.conversation = ConversationManager(max_messages=20)
15        self.knowledge = KnowledgeStore()
16    
17    def chat(self, user_message: str) -> str:
18        """
19        Process a user message and return a response.
20        Handles both conversation and knowledge storage.
21        """
22        # Check if this is a memory command
23        if self._is_memory_command(user_message):
24            return self._handle_memory_command(user_message)
25        
26        # Search for relevant facts
27        relevant_facts = self.knowledge.search(user_message)
28        
29        # Build system prompt with context
30        system_prompt = self._build_system_prompt(relevant_facts)
31        
32        # Add user message to conversation
33        self.conversation.add_user_message(user_message)
34        
35        # Get response from Claude
36        response = self.client.messages.create(
37            model=self.model,
38            max_tokens=1024,
39            system=system_prompt,
40            messages=self.conversation.get_messages()
41        )
42        
43        # Extract and store response
44        assistant_message = response.content[0].text
45        self.conversation.add_assistant_message(assistant_message)
46        
47        return assistant_message
48    
49    def _is_memory_command(self, message: str) -> bool:
50        """Detect if user wants to store something."""
51        keywords = ["remember", "store", "save", "keep in mind", "note that"]
52        message_lower = message.lower()
53        return any(keyword in message_lower for keyword in keywords)
54    
55    def _handle_memory_command(self, message: str) -> str:
56        """Extract and store information."""
57        # Use Claude to extract the fact
58        response = self.client.messages.create(
59            model=self.model,
60            max_tokens=256,
61            system="Extract the key fact the user wants you to remember. Return only the fact as a clear, concise statement.",
62            messages=[{"role": "user", "content": message}]
63        )
64        
65        fact = response.content[0].text
66        self.knowledge.add_fact(fact)
67        
68        return f"Got it! I'll remember that {fact.lower()}"
69    
70    def _build_system_prompt(self, relevant_facts: list) -> str:
71        """Build system prompt with retrieved knowledge."""
72        base_prompt = "You are a helpful personal assistant."
73        
74        if not relevant_facts:
75            return base_prompt
76        
77        facts_text = "\n".join(f"- {fact}" for fact in relevant_facts)
78        return f"""{base_prompt}
79
80You have access to the following information about the user:
81
82{facts_text}
83
84Use this information when relevant to provide personalized responses."""
85    
86    def start_new_conversation(self):
87        """Clear conversation history but keep long-term knowledge."""
88        self.conversation.clear()

1import anthropic
2
3class PersonalAssistant:
4    """
5    A personal assistant with both short-term and long-term memory.
6    """
7    
8    def __init__(self, api_key: str):
9        # Using Claude Sonnet 4.5 for its excellent agent reasoning capabilities
10        self.client = anthropic.Anthropic(api_key=api_key)
11        self.model = "claude-sonnet-4.5"
12        
13        # Memory systems
14        self.conversation = ConversationManager(max_messages=20)
15        self.knowledge = KnowledgeStore()
16    
17    def chat(self, user_message: str) -> str:
18        """
19        Process a user message and return a response.
20        Handles both conversation and knowledge storage.
21        """
22        # Check if this is a memory command
23        if self._is_memory_command(user_message):
24            return self._handle_memory_command(user_message)
25        
26        # Search for relevant facts
27        relevant_facts = self.knowledge.search(user_message)
28        
29        # Build system prompt with context
30        system_prompt = self._build_system_prompt(relevant_facts)
31        
32        # Add user message to conversation
33        self.conversation.add_user_message(user_message)
34        
35        # Get response from Claude
36        response = self.client.messages.create(
37            model=self.model,
38            max_tokens=1024,
39            system=system_prompt,
40            messages=self.conversation.get_messages()
41        )
42        
43        # Extract and store response
44        assistant_message = response.content[0].text
45        self.conversation.add_assistant_message(assistant_message)
46        
47        return assistant_message
48    
49    def _is_memory_command(self, message: str) -> bool:
50        """Detect if user wants to store something."""
51        keywords = ["remember", "store", "save", "keep in mind", "note that"]
52        message_lower = message.lower()
53        return any(keyword in message_lower for keyword in keywords)
54    
55    def _handle_memory_command(self, message: str) -> str:
56        """Extract and store information."""
57        # Use Claude to extract the fact
58        response = self.client.messages.create(
59            model=self.model,
60            max_tokens=256,
61            system="Extract the key fact the user wants you to remember. Return only the fact as a clear, concise statement.",
62            messages=[{"role": "user", "content": message}]
63        )
64        
65        fact = response.content[0].text
66        self.knowledge.add_fact(fact)
67        
68        return f"Got it! I'll remember that {fact.lower()}"
69    
70    def _build_system_prompt(self, relevant_facts: list) -> str:
71        """Build system prompt with retrieved knowledge."""
72        base_prompt = "You are a helpful personal assistant."
73        
74        if not relevant_facts:
75            return base_prompt
76        
77        facts_text = "\n".join(f"- {fact}" for fact in relevant_facts)
78        return f"""{base_prompt}
79
80You have access to the following information about the user:
81
82{facts_text}
83
84Use this information when relevant to provide personalized responses."""
85    
86    def start_new_conversation(self):
87        """Clear conversation history but keep long-term knowledge."""
88        self.conversation.clear()

Let's see this in action:

1## Create the assistant
2assistant = PersonalAssistant(api_key="YOUR_ANTHROPIC_API_KEY")
3
4## Store some facts
5print(assistant.chat("Remember that my birthday is July 20"))
6## Output: Got it! I'll remember that your birthday is july 20
7
8print(assistant.chat("Remember that I'm allergic to peanuts"))
9## Output: Got it! I'll remember that you're allergic to peanuts
10
11print(assistant.chat("Remember that I prefer Italian food"))
12## Output: Got it! I'll remember that you prefer italian food
13
14## Now ask questions that use this knowledge
15print(assistant.chat("What should I be careful about when eating out?"))
16## Output: Based on what I know, you should be careful about peanuts since 
17## you're allergic to them. Always inform restaurant staff about your allergy 
18## and ask about ingredients, especially in sauces and desserts where peanuts 
19## might be hidden...
20
21print(assistant.chat("Suggest a restaurant for my birthday dinner"))
22## Output: For your birthday on July 20, I'd suggest an Italian restaurant 
23## since you prefer Italian food. Make sure to mention your peanut allergy 
24## when making the reservation...

1## Create the assistant
2assistant = PersonalAssistant(api_key="YOUR_ANTHROPIC_API_KEY")
3
4## Store some facts
5print(assistant.chat("Remember that my birthday is July 20"))
6## Output: Got it! I'll remember that your birthday is july 20
7
8print(assistant.chat("Remember that I'm allergic to peanuts"))
9## Output: Got it! I'll remember that you're allergic to peanuts
10
11print(assistant.chat("Remember that I prefer Italian food"))
12## Output: Got it! I'll remember that you prefer italian food
13
14## Now ask questions that use this knowledge
15print(assistant.chat("What should I be careful about when eating out?"))
16## Output: Based on what I know, you should be careful about peanuts since 
17## you're allergic to them. Always inform restaurant staff about your allergy 
18## and ask about ingredients, especially in sauces and desserts where peanuts 
19## might be hidden...
20
21print(assistant.chat("Suggest a restaurant for my birthday dinner"))
22## Output: For your birthday on July 20, I'd suggest an Italian restaurant 
23## since you prefer Italian food. Make sure to mention your peanut allergy 
24## when making the reservation...

Notice how the assistant:

Stores facts when you use memory keywords
Retrieves relevant facts when answering questions
Combines stored knowledge with its language model capabilities
Maintains conversation context across multiple turns

This is a complete, working memory system. But we can make it better.

Improving Retrieval with Semantic Search

The keyword search works, but it misses semantically related information. If you ask "What foods should I avoid?", it won't find your peanut allergy unless the word "avoid" appears in the stored fact.

Let's upgrade to semantic search using embeddings:

1from sentence_transformers import SentenceTransformer
2import numpy as np
3
4class SemanticKnowledgeStore:
5    """
6    Knowledge store with semantic search using embeddings.
7    """
8    
9    def __init__(self, storage_file: str = "knowledge.json"):
10        self.storage_file = storage_file
11        # Using a local embedding model to keep costs down
12        self.encoder = SentenceTransformer('all-MiniLM-L6-v2')
13        self.facts = []
14        self.embeddings = []
15        self._load_data()
16    
17    def _load_data(self):
18        """Load facts and regenerate embeddings."""
19        if os.path.exists(self.storage_file):
20            with open(self.storage_file, 'r') as f:
21                data = json.load(f)
22                self.facts = data
23                # Regenerate embeddings
24                for entry in self.facts:
25                    embedding = self.encoder.encode(entry["fact"])
26                    self.embeddings.append(embedding)
27    
28    def _save_data(self):
29        """Save facts to disk."""
30        with open(self.storage_file, 'w') as f:
31            json.dump(self.facts, f, indent=2)
32    
33    def add_fact(self, fact: str, category: str = None):
34        """Store a fact with its embedding."""
35        entry = {
36            "fact": fact,
37            "category": category,
38            "timestamp": datetime.now().isoformat()
39        }
40        
41        # Generate embedding
42        embedding = self.encoder.encode(fact)
43        
44        self.facts.append(entry)
45        self.embeddings.append(embedding)
46        self._save_data()
47    
48    def search(self, query: str, top_k: int = 3, threshold: float = 0.3) -> list:
49        """
50        Search for facts semantically similar to the query.
51        
52        Args:
53            query: The search query
54            top_k: Maximum number of results to return
55            threshold: Minimum similarity score (0-1)
56        
57        Returns:
58            List of relevant facts
59        """
60        if not self.facts:
61            return []
62        
63        # Encode query
64        query_embedding = self.encoder.encode(query)
65        
66        # Calculate similarities
67        similarities = []
68        for i, fact_embedding in enumerate(self.embeddings):
69            # Cosine similarity
70            similarity = np.dot(query_embedding, fact_embedding) / (
71                np.linalg.norm(query_embedding) * np.linalg.norm(fact_embedding)
72            )
73            similarities.append((similarity, self.facts[i]["fact"]))
74        
75        # Sort by similarity and filter by threshold
76        similarities.sort(reverse=True, key=lambda x: x[0])
77        results = [fact for score, fact in similarities[:top_k] if score >= threshold]
78        
79        return results

1from sentence_transformers import SentenceTransformer
2import numpy as np
3
4class SemanticKnowledgeStore:
5    """
6    Knowledge store with semantic search using embeddings.
7    """
8    
9    def __init__(self, storage_file: str = "knowledge.json"):
10        self.storage_file = storage_file
11        # Using a local embedding model to keep costs down
12        self.encoder = SentenceTransformer('all-MiniLM-L6-v2')
13        self.facts = []
14        self.embeddings = []
15        self._load_data()
16    
17    def _load_data(self):
18        """Load facts and regenerate embeddings."""
19        if os.path.exists(self.storage_file):
20            with open(self.storage_file, 'r') as f:
21                data = json.load(f)
22                self.facts = data
23                # Regenerate embeddings
24                for entry in self.facts:
25                    embedding = self.encoder.encode(entry["fact"])
26                    self.embeddings.append(embedding)
27    
28    def _save_data(self):
29        """Save facts to disk."""
30        with open(self.storage_file, 'w') as f:
31            json.dump(self.facts, f, indent=2)
32    
33    def add_fact(self, fact: str, category: str = None):
34        """Store a fact with its embedding."""
35        entry = {
36            "fact": fact,
37            "category": category,
38            "timestamp": datetime.now().isoformat()
39        }
40        
41        # Generate embedding
42        embedding = self.encoder.encode(fact)
43        
44        self.facts.append(entry)
45        self.embeddings.append(embedding)
46        self._save_data()
47    
48    def search(self, query: str, top_k: int = 3, threshold: float = 0.3) -> list:
49        """
50        Search for facts semantically similar to the query.
51        
52        Args:
53            query: The search query
54            top_k: Maximum number of results to return
55            threshold: Minimum similarity score (0-1)
56        
57        Returns:
58            List of relevant facts
59        """
60        if not self.facts:
61            return []
62        
63        # Encode query
64        query_embedding = self.encoder.encode(query)
65        
66        # Calculate similarities
67        similarities = []
68        for i, fact_embedding in enumerate(self.embeddings):
69            # Cosine similarity
70            similarity = np.dot(query_embedding, fact_embedding) / (
71                np.linalg.norm(query_embedding) * np.linalg.norm(fact_embedding)
72            )
73            similarities.append((similarity, self.facts[i]["fact"]))
74        
75        # Sort by similarity and filter by threshold
76        similarities.sort(reverse=True, key=lambda x: x[0])
77        results = [fact for score, fact in similarities[:top_k] if score >= threshold]
78        
79        return results

Now replace KnowledgeStore with SemanticKnowledgeStore in the PersonalAssistant class:

1class PersonalAssistant:
2    def __init__(self, api_key: str):
3        self.client = anthropic.Anthropic(api_key=api_key)
4        self.model = "claude-sonnet-4.5"
5        self.conversation = ConversationManager(max_messages=20)
6        # Use semantic search instead
7        self.knowledge = SemanticKnowledgeStore()
8    
9    # ... rest of the methods stay the same

1class PersonalAssistant:
2    def __init__(self, api_key: str):
3        self.client = anthropic.Anthropic(api_key=api_key)
4        self.model = "claude-sonnet-4.5"
5        self.conversation = ConversationManager(max_messages=20)
6        # Use semantic search instead
7        self.knowledge = SemanticKnowledgeStore()
8    
9    # ... rest of the methods stay the same

Let's test the improved search:

1assistant = PersonalAssistant(api_key="YOUR_ANTHROPIC_API_KEY")
2
3## Store facts
4assistant.chat("Remember that I'm allergic to peanuts")
5assistant.chat("Remember that I prefer Italian food")
6assistant.chat("Remember that I live in San Francisco")
7
8## Semantic queries that don't match keywords exactly
9print(assistant.chat("What foods should I avoid?"))
10## Finds: "You're allergic to peanuts"
11## Output: You should avoid peanuts and any foods containing peanuts or 
12## peanut oil, as you're allergic to them...
13
14print(assistant.chat("Where am I located?"))
15## Finds: "You live in San Francisco"
16## Output: You're located in San Francisco.
17
18print(assistant.chat("What cuisine do I enjoy?"))
19## Finds: "You prefer Italian food"
20## Output: You enjoy Italian cuisine.

1assistant = PersonalAssistant(api_key="YOUR_ANTHROPIC_API_KEY")
2
3## Store facts
4assistant.chat("Remember that I'm allergic to peanuts")
5assistant.chat("Remember that I prefer Italian food")
6assistant.chat("Remember that I live in San Francisco")
7
8## Semantic queries that don't match keywords exactly
9print(assistant.chat("What foods should I avoid?"))
10## Finds: "You're allergic to peanuts"
11## Output: You should avoid peanuts and any foods containing peanuts or 
12## peanut oil, as you're allergic to them...
13
14print(assistant.chat("Where am I located?"))
15## Finds: "You live in San Francisco"
16## Output: You're located in San Francisco.
17
18print(assistant.chat("What cuisine do I enjoy?"))
19## Finds: "You prefer Italian food"
20## Output: You enjoy Italian cuisine.

The semantic search understands meaning, not just keywords. "What foods should I avoid?" finds your allergy information even though "avoid" doesn't appear in the stored fact.

Adding Tool Use with Memory

Let's extend our assistant to use tools while maintaining memory. We'll add a calculator tool as an example:

1def calculate(expression: str) -> dict:
2    """Calculator tool for mathematical operations."""
3    try:
4        result = eval(expression)
5        return {"success": True, "result": result}
6    except Exception as e:
7        return {"success": False, "error": str(e)}
8
9calculator_tool = {
10    "name": "calculate",
11    "description": "Perform mathematical calculations. Input should be a valid Python expression.",
12    "input_schema": {
13        "type": "object",
14        "properties": {
15            "expression": {
16                "type": "string",
17                "description": "Mathematical expression to evaluate (e.g., '2 + 2', '15 * 3.5')"
18            }
19        },
20        "required": ["expression"]
21    }
22}
23
24class PersonalAssistantWithTools:
25    """Assistant with memory and tool use."""
26    
27    def __init__(self, api_key: str):
28        self.client = anthropic.Anthropic(api_key=api_key)
29        self.model = "claude-sonnet-4.5"
30        self.conversation = ConversationManager(max_messages=20)
31        self.knowledge = SemanticKnowledgeStore()
32        self.tools = {"calculate": calculate}
33    
34    def chat(self, user_message: str) -> str:
35        """Process message with tool use support."""
36        # Handle memory commands
37        if self._is_memory_command(user_message):
38            return self._handle_memory_command(user_message)
39        
40        # Search for relevant facts
41        relevant_facts = self.knowledge.search(user_message)
42        system_prompt = self._build_system_prompt(relevant_facts)
43        
44        # Add user message
45        self.conversation.add_user_message(user_message)
46        
47        # Get response with tools
48        response = self.client.messages.create(
49            model=self.model,
50            max_tokens=1024,
51            system=system_prompt,
52            tools=[calculator_tool],
53            messages=self.conversation.get_messages()
54        )
55        
56        # Handle tool use if needed
57        while response.stop_reason == "tool_use":
58            # Extract tool call
59            tool_use_block = next(
60                block for block in response.content 
61                if block.type == "tool_use"
62            )
63            
64            # Execute tool
65            tool_result = self.tools[tool_use_block.name](
66                **tool_use_block.input
67            )
68            
69            # Add tool use to conversation
70            self.conversation.add_assistant_message(response.content)
71            
72            # Add tool result
73            self.conversation.messages.append({
74                "role": "user",
75                "content": [{
76                    "type": "tool_result",
77                    "tool_use_id": tool_use_block.id,
78                    "content": str(tool_result)
79                }]
80            })
81            
82            # Get next response
83            response = self.client.messages.create(
84                model=self.model,
85                max_tokens=1024,
86                system=system_prompt,
87                tools=[calculator_tool],
88                messages=self.conversation.get_messages()
89            )
90        
91        # Extract final response
92        assistant_message = response.content[0].text
93        self.conversation.add_assistant_message(assistant_message)
94        
95        return assistant_message
96    
97    # ... other methods same as before

1def calculate(expression: str) -> dict:
2    """Calculator tool for mathematical operations."""
3    try:
4        result = eval(expression)
5        return {"success": True, "result": result}
6    except Exception as e:
7        return {"success": False, "error": str(e)}
8
9calculator_tool = {
10    "name": "calculate",
11    "description": "Perform mathematical calculations. Input should be a valid Python expression.",
12    "input_schema": {
13        "type": "object",
14        "properties": {
15            "expression": {
16                "type": "string",
17                "description": "Mathematical expression to evaluate (e.g., '2 + 2', '15 * 3.5')"
18            }
19        },
20        "required": ["expression"]
21    }
22}
23
24class PersonalAssistantWithTools:
25    """Assistant with memory and tool use."""
26    
27    def __init__(self, api_key: str):
28        self.client = anthropic.Anthropic(api_key=api_key)
29        self.model = "claude-sonnet-4.5"
30        self.conversation = ConversationManager(max_messages=20)
31        self.knowledge = SemanticKnowledgeStore()
32        self.tools = {"calculate": calculate}
33    
34    def chat(self, user_message: str) -> str:
35        """Process message with tool use support."""
36        # Handle memory commands
37        if self._is_memory_command(user_message):
38            return self._handle_memory_command(user_message)
39        
40        # Search for relevant facts
41        relevant_facts = self.knowledge.search(user_message)
42        system_prompt = self._build_system_prompt(relevant_facts)
43        
44        # Add user message
45        self.conversation.add_user_message(user_message)
46        
47        # Get response with tools
48        response = self.client.messages.create(
49            model=self.model,
50            max_tokens=1024,
51            system=system_prompt,
52            tools=[calculator_tool],
53            messages=self.conversation.get_messages()
54        )
55        
56        # Handle tool use if needed
57        while response.stop_reason == "tool_use":
58            # Extract tool call
59            tool_use_block = next(
60                block for block in response.content 
61                if block.type == "tool_use"
62            )
63            
64            # Execute tool
65            tool_result = self.tools[tool_use_block.name](
66                **tool_use_block.input
67            )
68            
69            # Add tool use to conversation
70            self.conversation.add_assistant_message(response.content)
71            
72            # Add tool result
73            self.conversation.messages.append({
74                "role": "user",
75                "content": [{
76                    "type": "tool_result",
77                    "tool_use_id": tool_use_block.id,
78                    "content": str(tool_result)
79                }]
80            })
81            
82            # Get next response
83            response = self.client.messages.create(
84                model=self.model,
85                max_tokens=1024,
86                system=system_prompt,
87                tools=[calculator_tool],
88                messages=self.conversation.get_messages()
89            )
90        
91        # Extract final response
92        assistant_message = response.content[0].text
93        self.conversation.add_assistant_message(assistant_message)
94        
95        return assistant_message
96    
97    # ... other methods same as before

Now watch memory and tools work together:

1assistant = PersonalAssistantWithTools(api_key="YOUR_ANTHROPIC_API_KEY")
2
3## Store a fact
4print(assistant.chat("Remember that I need to save \$500 per month"))
5## Output: Got it! I'll remember that you need to save $500 per month
6
7## Use tools with memory
8print(assistant.chat("I earn \$3000 per month. After my savings goal, how much do I have left?"))
9## Agent retrieves: "You need to save $500 per month"
10## Agent uses calculator: 3000 - 500
11## Output: After setting aside your $500 monthly savings goal, you'll have 
12## $2,500 left for other expenses.
13
14print(assistant.chat("If I split the remaining amount across 4 weeks, how much is that per week?"))
15## Agent remembers previous calculation: $2,500
16## Agent uses calculator: 2500 / 4
17## Output: Splitting your remaining $2,500 across 4 weeks gives you $625 per week.

1assistant = PersonalAssistantWithTools(api_key="YOUR_ANTHROPIC_API_KEY")
2
3## Store a fact
4print(assistant.chat("Remember that I need to save \$500 per month"))
5## Output: Got it! I'll remember that you need to save $500 per month
6
7## Use tools with memory
8print(assistant.chat("I earn \$3000 per month. After my savings goal, how much do I have left?"))
9## Agent retrieves: "You need to save $500 per month"
10## Agent uses calculator: 3000 - 500
11## Output: After setting aside your $500 monthly savings goal, you'll have 
12## $2,500 left for other expenses.
13
14print(assistant.chat("If I split the remaining amount across 4 weeks, how much is that per week?"))
15## Agent remembers previous calculation: $2,500
16## Agent uses calculator: 2500 / 4
17## Output: Splitting your remaining $2,500 across 4 weeks gives you $625 per week.

The assistant combines three capabilities:

Long-term memory: Retrieves your savings goal
Short-term memory: Remembers the previous calculation
Tool use: Performs accurate calculations

This is powerful. The agent can reference stored facts, maintain conversation context, and use tools to solve problems it couldn't handle with language alone.

Managing Memory Over Time

As your assistant accumulates facts, you'll need ways to manage them. Let's add some utilities:

1class PersonalAssistantWithManagement(PersonalAssistantWithTools):
2    """Assistant with memory management capabilities."""
3    
4    def list_facts(self) -> str:
5        """Show all stored facts."""
6        facts = self.knowledge.get_all_facts()
7        
8        if not facts:
9            return "I don't have any facts stored yet."
10        
11        facts_list = "\n".join(f"{i+1}. {fact}" for i, fact in enumerate(facts))
12        return f"Here's what I know about you:\n\n{facts_list}"
13    
14    def forget_fact(self, fact_number: int) -> str:
15        """Remove a specific fact."""
16        facts = self.knowledge.facts
17        
18        if fact_number < 1 or fact_number > len(facts):
19            return f"Invalid fact number. I have {len(facts)} facts stored."
20        
21        removed_fact = facts[fact_number - 1]["fact"]
22        
23        # Remove from both lists
24        facts.pop(fact_number - 1)
25        self.knowledge.embeddings.pop(fact_number - 1)
26        self.knowledge._save_data()
27        
28        return f"Forgot: {removed_fact}"
29    
30    def update_fact(self, fact_number: int, new_fact: str) -> str:
31        """Update an existing fact."""
32        facts = self.knowledge.facts
33        
34        if fact_number < 1 or fact_number > len(facts):
35            return f"Invalid fact number. I have {len(facts)} facts stored."
36        
37        old_fact = facts[fact_number - 1]["fact"]
38        
39        # Update fact and embedding
40        facts[fact_number - 1]["fact"] = new_fact
41        facts[fact_number - 1]["timestamp"] = datetime.now().isoformat()
42        self.knowledge.embeddings[fact_number - 1] = self.knowledge.encoder.encode(new_fact)
43        self.knowledge._save_data()
44        
45        return f"Updated: '{old_fact}' → '{new_fact}'"

1class PersonalAssistantWithManagement(PersonalAssistantWithTools):
2    """Assistant with memory management capabilities."""
3    
4    def list_facts(self) -> str:
5        """Show all stored facts."""
6        facts = self.knowledge.get_all_facts()
7        
8        if not facts:
9            return "I don't have any facts stored yet."
10        
11        facts_list = "\n".join(f"{i+1}. {fact}" for i, fact in enumerate(facts))
12        return f"Here's what I know about you:\n\n{facts_list}"
13    
14    def forget_fact(self, fact_number: int) -> str:
15        """Remove a specific fact."""
16        facts = self.knowledge.facts
17        
18        if fact_number < 1 or fact_number > len(facts):
19            return f"Invalid fact number. I have {len(facts)} facts stored."
20        
21        removed_fact = facts[fact_number - 1]["fact"]
22        
23        # Remove from both lists
24        facts.pop(fact_number - 1)
25        self.knowledge.embeddings.pop(fact_number - 1)
26        self.knowledge._save_data()
27        
28        return f"Forgot: {removed_fact}"
29    
30    def update_fact(self, fact_number: int, new_fact: str) -> str:
31        """Update an existing fact."""
32        facts = self.knowledge.facts
33        
34        if fact_number < 1 or fact_number > len(facts):
35            return f"Invalid fact number. I have {len(facts)} facts stored."
36        
37        old_fact = facts[fact_number - 1]["fact"]
38        
39        # Update fact and embedding
40        facts[fact_number - 1]["fact"] = new_fact
41        facts[fact_number - 1]["timestamp"] = datetime.now().isoformat()
42        self.knowledge.embeddings[fact_number - 1] = self.knowledge.encoder.encode(new_fact)
43        self.knowledge._save_data()
44        
45        return f"Updated: '{old_fact}' → '{new_fact}'"

Now you can manage stored knowledge:

1assistant = PersonalAssistantWithManagement(api_key="YOUR_ANTHROPIC_API_KEY")
2
3## Store some facts
4assistant.chat("Remember that I live in San Francisco")
5assistant.chat("Remember that I'm allergic to peanuts")
6assistant.chat("Remember that I prefer Italian food")
7
8## List all facts
9print(assistant.list_facts())
10## Output:
11## Here's what I know about you:
12## 
13## 1. You live in San Francisco
14## 2. You're allergic to peanuts
15## 3. You prefer Italian food
16
17## Update a fact
18print(assistant.update_fact(1, "You live in Oakland"))
19## Output: Updated: 'You live in San Francisco' → 'You live in Oakland'
20
21## Remove a fact
22print(assistant.forget_fact(2))
23## Output: Forgot: You're allergic to peanuts

1assistant = PersonalAssistantWithManagement(api_key="YOUR_ANTHROPIC_API_KEY")
2
3## Store some facts
4assistant.chat("Remember that I live in San Francisco")
5assistant.chat("Remember that I'm allergic to peanuts")
6assistant.chat("Remember that I prefer Italian food")
7
8## List all facts
9print(assistant.list_facts())
10## Output:
11## Here's what I know about you:
12## 
13## 1. You live in San Francisco
14## 2. You're allergic to peanuts
15## 3. You prefer Italian food
16
17## Update a fact
18print(assistant.update_fact(1, "You live in Oakland"))
19## Output: Updated: 'You live in San Francisco' → 'You live in Oakland'
20
21## Remove a fact
22print(assistant.forget_fact(2))
23## Output: Forgot: You're allergic to peanuts

These management functions give users control over their data. This is important for privacy and accuracy.

Design Decisions and Trade-offs

Let's discuss the choices we made and their implications:

Sliding Window for Conversations

We limit conversation history to 20 messages. This:

Pros:

Keeps costs predictable
Prevents context overflow
Simple to implement

Cons:

Loses older conversation context
Might forget important details from earlier in the session

Alternative: Use summarization instead of hard truncation. Periodically summarize old messages and keep the summary.

Keyword Detection for Memory Commands

We detect "remember", "store", etc. to trigger memory storage. This:

Pros:

Simple and fast
No extra API calls
User has explicit control

Cons:

Might miss implicit memory requests
Requires specific keywords

Alternative: Use the LLM to classify every message as "store this" or "just chat". More flexible but costs more.

Semantic Search with Local Embeddings

We use a local embedding model for semantic search. This:

Pros:

Fast and free
Good enough for most use cases
No API calls for every search

Cons:

Less powerful than API-based embeddings
Requires installing additional libraries

Alternative: Use OpenAI's embedding API or Anthropic's future embedding service. Better quality but costs money.

JSON File Storage

We store facts in a JSON file. This:

Pros:

Simple to implement
No database setup required
Easy to inspect and debug

Cons:

Doesn't scale to thousands of facts
No concurrent access support
Limited query capabilities

Alternative: Use a proper database (SQLite for local, PostgreSQL for production) or a vector database (Pinecone, Weaviate, Chroma).

When to Use What

Here's a practical guide for choosing memory strategies:

For personal projects or prototypes:

Use the simple JSON-based approach we've shown
Sliding window for conversation history
Local embeddings for semantic search

For production applications with <1000 users:

SQLite or PostgreSQL for knowledge storage
Consider Redis for conversation history (fast, ephemeral)
Still use local embeddings if cost is a concern

For large-scale applications:

Vector database (Pinecone, Weaviate, Chroma) for knowledge
Redis or similar for conversation state
API-based embeddings for best quality
Implement proper user isolation and security

Testing Your Memory System

How do you know if your memory system works well? Here are some tests:

1def test_memory_system():
2    """Test that memory works correctly."""
3    assistant = PersonalAssistant(api_key="YOUR_ANTHROPIC_API_KEY")
4    
5    # Test 1: Store and retrieve
6    assistant.chat("Remember that my birthday is July 20")
7    response = assistant.chat("When is my birthday?")
8    assert "july 20" in response.lower(), "Failed to retrieve birthday"
9    
10    # Test 2: Semantic search
11    assistant.chat("Remember that I'm allergic to peanuts")
12    response = assistant.chat("What should I avoid eating?")
13    assert "peanut" in response.lower(), "Failed semantic search"
14    
15    # Test 3: Conversation context
16    assistant.chat("I'm planning a trip to Japan")
17    response = assistant.chat("What's the best time to visit?")
18    assert "japan" in response.lower(), "Lost conversation context"
19    
20    # Test 4: Persistence
21    assistant2 = PersonalAssistant(api_key="YOUR_ANTHROPIC_API_KEY")
22    response = assistant2.chat("When is my birthday?")
23    assert "july 20" in response.lower(), "Facts not persisted"
24    
25    print("All tests passed!")
26
27test_memory_system()

1def test_memory_system():
2    """Test that memory works correctly."""
3    assistant = PersonalAssistant(api_key="YOUR_ANTHROPIC_API_KEY")
4    
5    # Test 1: Store and retrieve
6    assistant.chat("Remember that my birthday is July 20")
7    response = assistant.chat("When is my birthday?")
8    assert "july 20" in response.lower(), "Failed to retrieve birthday"
9    
10    # Test 2: Semantic search
11    assistant.chat("Remember that I'm allergic to peanuts")
12    response = assistant.chat("What should I avoid eating?")
13    assert "peanut" in response.lower(), "Failed semantic search"
14    
15    # Test 3: Conversation context
16    assistant.chat("I'm planning a trip to Japan")
17    response = assistant.chat("What's the best time to visit?")
18    assert "japan" in response.lower(), "Lost conversation context"
19    
20    # Test 4: Persistence
21    assistant2 = PersonalAssistant(api_key="YOUR_ANTHROPIC_API_KEY")
22    response = assistant2.chat("When is my birthday?")
23    assert "july 20" in response.lower(), "Facts not persisted"
24    
25    print("All tests passed!")
26
27test_memory_system()

These tests verify:

Facts are stored and retrieved correctly
Semantic search finds relevant information
Conversation context is maintained
Data persists across sessions

What You've Built

Let's appreciate what you now have:

A complete memory system with both short-term and long-term components working together.

Semantic search that understands meaning, not just keywords.

Tool integration where memory enhances tool use.

Memory management giving users control over their data.

Persistent storage that survives between sessions.

This is a real, working personal assistant. You can extend it with more tools, better storage, or additional capabilities. The foundation is solid.

Practical Considerations

As you deploy your memory system, keep these points in mind:

Privacy matters: You're storing personal information. Encrypt sensitive data, provide ways to export or delete it, and be transparent about what you store.

Memory can be wrong: Users might tell you incorrect information or change their minds. Provide ways to correct or update facts.

Not everything should be remembered: Some information is temporary or sensitive. Consider what truly needs long-term storage.

Test with real users: Your assumptions about what to remember might differ from what users actually need. Gather feedback.

Monitor costs: If using API-based embeddings or large context windows, track your spending. Optimize where needed.

Key Takeaways

Let's review what we've learned about implementing memory:

Memory has two layers: Short-term for conversations, long-term for persistent facts. Both are essential.

Start simple: A list for conversations and a JSON file for facts works fine for many applications.

Semantic search is powerful: Embeddings let you find information by meaning, making retrieval much more useful.

Memory enhances everything: When combined with tools and reasoning, memory makes your agent far more capable.

Design for users: Provide ways to view, update, and delete stored information. Users should control their data.

Your personal assistant now has a complete memory system. It remembers conversations, stores important facts, retrieves relevant information, and combines all of this with language model capabilities and tool use. This is a significant milestone in building truly useful AI agents.

In the next chapter, we'll explore how to organize all these components into a coherent agent architecture, showing how memory, reasoning, tools, and state management work together in a unified system.

Glossary

Conversation Manager: A component that handles short-term memory by storing and managing the recent message history in a conversation.

Knowledge Store: A persistent storage system for long-term facts and information that survives across sessions.

Semantic Search: Finding information based on meaning rather than exact keyword matches, typically using embeddings and similarity calculations.

Sliding Window: A memory management strategy that keeps only the most recent N messages, automatically discarding older ones.

Embedding: A numerical vector representation of text that captures its semantic meaning, enabling similarity-based search.

Tool Integration: The ability for an agent to use external functions or APIs while maintaining memory of both the conversation and stored knowledge.

Memory Management: Features that let users view, update, or delete stored information, giving them control over their data.

Quiz

Ready to test your understanding? Take this quick quiz to reinforce what you've learned about implementing memory in AI agents.

Loading component...

Back to AI Agent Handbook

Previous Chapter

Long-Term Knowledge Storage and Retrieval

Next Chapter

Understanding the Agent's State

Reference

BIBTEXAcademic

@misc{implementingmemoryinouragentbuildingacompletepersonalassistantwithshorttermandlongtermmemory, author = {Michael Brenndoerfer}, title = {Implementing Memory in Our Agent: Building a Complete Personal Assistant with Short-Term and Long-Term Memory}, year = {2025}, url = {https://mbrenndoerfer.com/writing/implementing-memory-in-ai-agents}, organization = {mbrenndoerfer.com}, note = {Accessed: 2025-11-10} }

APAAcademic

Michael Brenndoerfer (2025). Implementing Memory in Our Agent: Building a Complete Personal Assistant with Short-Term and Long-Term Memory. Retrieved from https://mbrenndoerfer.com/writing/implementing-memory-in-ai-agents

MLAAcademic

Michael Brenndoerfer. "Implementing Memory in Our Agent: Building a Complete Personal Assistant with Short-Term and Long-Term Memory." 2025. Web. 11/10/2025. <https://mbrenndoerfer.com/writing/implementing-memory-in-ai-agents>.

CHICAGOAcademic

Michael Brenndoerfer. "Implementing Memory in Our Agent: Building a Complete Personal Assistant with Short-Term and Long-Term Memory." Accessed 11/10/2025. https://mbrenndoerfer.com/writing/implementing-memory-in-ai-agents.

HARVARDAcademic

Michael Brenndoerfer (2025) 'Implementing Memory in Our Agent: Building a Complete Personal Assistant with Short-Term and Long-Term Memory'. Available at: https://mbrenndoerfer.com/writing/implementing-memory-in-ai-agents (Accessed: 11/10/2025).

SimpleBasic

Michael Brenndoerfer (2025). Implementing Memory in Our Agent: Building a Complete Personal Assistant with Short-Term and Long-Term Memory. https://mbrenndoerfer.com/writing/implementing-memory-in-ai-agents

Direct link:

https://mbrenndoerfer.com/writing/implementing-memory-in-ai-agents

Part of AI Agent Handbook

This article is part of the free-to-read AI Agent Handbook

View full handbook

About the author: Michael Brenndoerfer

All opinions expressed here are my own and do not reflect the views of my employer.

Michael currently works as an Associate Director of Data Science at EQT Partners in Singapore, where he drives AI and data initiatives across private capital investments.

With over a decade of experience spanning private equity, management consulting, and software engineering, he specializes in building and scaling analytics capabilities from the ground up. He has published research in leading AI conferences and holds expertise in machine learning, natural language processing, and value creation through data.

View Full Resume Publications

InteractiveImplementing Memory in Our Agent: Building a Complete Personal Assistant with Short-Term and Long-Term Memory

Implementing Memory in Our Agent

Starting Point: A Complete Memory System

The Conversation Manager

The Knowledge Store

Putting It Together

Improving Retrieval with Semantic Search

Adding Tool Use with Memory

Managing Memory Over Time

Design Decisions and Trade-offs

Sliding Window for Conversations

Keyword Detection for Memory Commands

Semantic Search with Local Embeddings

JSON File Storage

When to Use What

Testing Your Memory System

What You've Built

Practical Considerations

Key Takeaways

Glossary

Quiz

Long-Term Knowledge Storage and Retrieval

Understanding the Agent's State

Reference

About the author: Michael Brenndoerfer

Related Content

Scaling Up without Breaking the Bank: AI Agent Performance & Cost Optimization at Scale

Managing and Reducing AI Agent Costs: Complete Guide to Cost Optimization Strategies

Speeding Up AI Agents: Performance Optimization Techniques for Faster Response Times

Stay updated