Learn how to build a complete AI agent memory system combining conversation history and persistent knowledge storage. Includes semantic search, tool integration, and practical implementation patterns.

This article is part of the free-to-read AI Agent Handbook
Implementing Memory in Our Agent
You've learned the concepts: short-term memory for conversations and long-term memory for persistent knowledge. Now let's build a complete personal assistant that combines both. We'll start with a minimal implementation and gradually add sophistication, showing you exactly how memory works in practice.
By the end of this chapter, you'll have a working assistant that remembers conversations, stores important facts, and retrieves relevant information when needed. More importantly, you'll understand the design decisions and trade-offs involved in building memory systems for AI agents.
Starting Point: A Complete Memory System
Let's build our assistant step by step. We'll start with the core components and then assemble them into a working system.
The Conversation Manager
First, we need something to handle short-term memory. This manages the ongoing conversation:
1class ConversationManager:
2 """
3 Manages short-term conversation memory with automatic windowing.
4 """
5
6 def __init__(self, max_messages: int = 20):
7 self.messages = []
8 self.max_messages = max_messages
9
10 def add_user_message(self, content: str):
11 """Add a user message to history."""
12 self.messages.append({
13 "role": "user",
14 "content": content
15 })
16 self._trim_if_needed()
17
18 def add_assistant_message(self, content: str):
19 """Add an assistant message to history."""
20 self.messages.append({
21 "role": "assistant",
22 "content": content
23 })
24 self._trim_if_needed()
25
26 def _trim_if_needed(self):
27 """Keep only the most recent messages."""
28 if len(self.messages) > self.max_messages:
29 self.messages = self.messages[-self.max_messages:]
30
31 def get_messages(self) -> list:
32 """Get all current messages."""
33 return self.messages.copy()
34
35 def clear(self):
36 """Start a fresh conversation."""
37 self.messages = []1class ConversationManager:
2 """
3 Manages short-term conversation memory with automatic windowing.
4 """
5
6 def __init__(self, max_messages: int = 20):
7 self.messages = []
8 self.max_messages = max_messages
9
10 def add_user_message(self, content: str):
11 """Add a user message to history."""
12 self.messages.append({
13 "role": "user",
14 "content": content
15 })
16 self._trim_if_needed()
17
18 def add_assistant_message(self, content: str):
19 """Add an assistant message to history."""
20 self.messages.append({
21 "role": "assistant",
22 "content": content
23 })
24 self._trim_if_needed()
25
26 def _trim_if_needed(self):
27 """Keep only the most recent messages."""
28 if len(self.messages) > self.max_messages:
29 self.messages = self.messages[-self.max_messages:]
30
31 def get_messages(self) -> list:
32 """Get all current messages."""
33 return self.messages.copy()
34
35 def clear(self):
36 """Start a fresh conversation."""
37 self.messages = []This is straightforward. We maintain a list of messages and automatically trim it when it gets too long. The sliding window approach keeps costs predictable and prevents context overflow.
The Knowledge Store
Next, we need long-term memory. This stores facts persistently:
1import json
2import os
3from datetime import datetime
4
5class KnowledgeStore:
6 """
7 Stores and retrieves long-term facts about the user.
8 """
9
10 def __init__(self, storage_file: str = "knowledge.json"):
11 self.storage_file = storage_file
12 self.facts = self._load_facts()
13
14 def _load_facts(self) -> list:
15 """Load facts from disk."""
16 if os.path.exists(self.storage_file):
17 with open(self.storage_file, 'r') as f:
18 return json.load(f)
19 return []
20
21 def _save_facts(self):
22 """Save facts to disk."""
23 with open(self.storage_file, 'w') as f:
24 json.dump(self.facts, f, indent=2)
25
26 def add_fact(self, fact: str, category: str = None):
27 """Store a new fact."""
28 entry = {
29 "fact": fact,
30 "category": category,
31 "timestamp": datetime.now().isoformat()
32 }
33 self.facts.append(entry)
34 self._save_facts()
35
36 def search(self, query: str) -> list:
37 """
38 Search for relevant facts.
39 Simple keyword matching for now.
40 """
41 query_lower = query.lower()
42 results = []
43
44 for entry in self.facts:
45 fact_lower = entry["fact"].lower()
46 if query_lower in fact_lower:
47 results.append(entry["fact"])
48 elif entry["category"] and query_lower in entry["category"].lower():
49 results.append(entry["fact"])
50
51 return results
52
53 def get_all_facts(self) -> list:
54 """Get all stored facts."""
55 return [entry["fact"] for entry in self.facts]1import json
2import os
3from datetime import datetime
4
5class KnowledgeStore:
6 """
7 Stores and retrieves long-term facts about the user.
8 """
9
10 def __init__(self, storage_file: str = "knowledge.json"):
11 self.storage_file = storage_file
12 self.facts = self._load_facts()
13
14 def _load_facts(self) -> list:
15 """Load facts from disk."""
16 if os.path.exists(self.storage_file):
17 with open(self.storage_file, 'r') as f:
18 return json.load(f)
19 return []
20
21 def _save_facts(self):
22 """Save facts to disk."""
23 with open(self.storage_file, 'w') as f:
24 json.dump(self.facts, f, indent=2)
25
26 def add_fact(self, fact: str, category: str = None):
27 """Store a new fact."""
28 entry = {
29 "fact": fact,
30 "category": category,
31 "timestamp": datetime.now().isoformat()
32 }
33 self.facts.append(entry)
34 self._save_facts()
35
36 def search(self, query: str) -> list:
37 """
38 Search for relevant facts.
39 Simple keyword matching for now.
40 """
41 query_lower = query.lower()
42 results = []
43
44 for entry in self.facts:
45 fact_lower = entry["fact"].lower()
46 if query_lower in fact_lower:
47 results.append(entry["fact"])
48 elif entry["category"] and query_lower in entry["category"].lower():
49 results.append(entry["fact"])
50
51 return results
52
53 def get_all_facts(self) -> list:
54 """Get all stored facts."""
55 return [entry["fact"] for entry in self.facts]This provides persistent storage with simple keyword search. We save facts to a JSON file so they survive between sessions. Each fact has a timestamp and optional category for organization.
Putting It Together
Now let's combine these into a complete assistant. Example (Claude Sonnet 4.5):
1import anthropic
2
3class PersonalAssistant:
4 """
5 A personal assistant with both short-term and long-term memory.
6 """
7
8 def __init__(self, api_key: str):
9 # Using Claude Sonnet 4.5 for its excellent agent reasoning capabilities
10 self.client = anthropic.Anthropic(api_key=api_key)
11 self.model = "claude-sonnet-4.5"
12
13 # Memory systems
14 self.conversation = ConversationManager(max_messages=20)
15 self.knowledge = KnowledgeStore()
16
17 def chat(self, user_message: str) -> str:
18 """
19 Process a user message and return a response.
20 Handles both conversation and knowledge storage.
21 """
22 # Check if this is a memory command
23 if self._is_memory_command(user_message):
24 return self._handle_memory_command(user_message)
25
26 # Search for relevant facts
27 relevant_facts = self.knowledge.search(user_message)
28
29 # Build system prompt with context
30 system_prompt = self._build_system_prompt(relevant_facts)
31
32 # Add user message to conversation
33 self.conversation.add_user_message(user_message)
34
35 # Get response from Claude
36 response = self.client.messages.create(
37 model=self.model,
38 max_tokens=1024,
39 system=system_prompt,
40 messages=self.conversation.get_messages()
41 )
42
43 # Extract and store response
44 assistant_message = response.content[0].text
45 self.conversation.add_assistant_message(assistant_message)
46
47 return assistant_message
48
49 def _is_memory_command(self, message: str) -> bool:
50 """Detect if user wants to store something."""
51 keywords = ["remember", "store", "save", "keep in mind", "note that"]
52 message_lower = message.lower()
53 return any(keyword in message_lower for keyword in keywords)
54
55 def _handle_memory_command(self, message: str) -> str:
56 """Extract and store information."""
57 # Use Claude to extract the fact
58 response = self.client.messages.create(
59 model=self.model,
60 max_tokens=256,
61 system="Extract the key fact the user wants you to remember. Return only the fact as a clear, concise statement.",
62 messages=[{"role": "user", "content": message}]
63 )
64
65 fact = response.content[0].text
66 self.knowledge.add_fact(fact)
67
68 return f"Got it! I'll remember that {fact.lower()}"
69
70 def _build_system_prompt(self, relevant_facts: list) -> str:
71 """Build system prompt with retrieved knowledge."""
72 base_prompt = "You are a helpful personal assistant."
73
74 if not relevant_facts:
75 return base_prompt
76
77 facts_text = "\n".join(f"- {fact}" for fact in relevant_facts)
78 return f"""{base_prompt}
79
80You have access to the following information about the user:
81
82{facts_text}
83
84Use this information when relevant to provide personalized responses."""
85
86 def start_new_conversation(self):
87 """Clear conversation history but keep long-term knowledge."""
88 self.conversation.clear()1import anthropic
2
3class PersonalAssistant:
4 """
5 A personal assistant with both short-term and long-term memory.
6 """
7
8 def __init__(self, api_key: str):
9 # Using Claude Sonnet 4.5 for its excellent agent reasoning capabilities
10 self.client = anthropic.Anthropic(api_key=api_key)
11 self.model = "claude-sonnet-4.5"
12
13 # Memory systems
14 self.conversation = ConversationManager(max_messages=20)
15 self.knowledge = KnowledgeStore()
16
17 def chat(self, user_message: str) -> str:
18 """
19 Process a user message and return a response.
20 Handles both conversation and knowledge storage.
21 """
22 # Check if this is a memory command
23 if self._is_memory_command(user_message):
24 return self._handle_memory_command(user_message)
25
26 # Search for relevant facts
27 relevant_facts = self.knowledge.search(user_message)
28
29 # Build system prompt with context
30 system_prompt = self._build_system_prompt(relevant_facts)
31
32 # Add user message to conversation
33 self.conversation.add_user_message(user_message)
34
35 # Get response from Claude
36 response = self.client.messages.create(
37 model=self.model,
38 max_tokens=1024,
39 system=system_prompt,
40 messages=self.conversation.get_messages()
41 )
42
43 # Extract and store response
44 assistant_message = response.content[0].text
45 self.conversation.add_assistant_message(assistant_message)
46
47 return assistant_message
48
49 def _is_memory_command(self, message: str) -> bool:
50 """Detect if user wants to store something."""
51 keywords = ["remember", "store", "save", "keep in mind", "note that"]
52 message_lower = message.lower()
53 return any(keyword in message_lower for keyword in keywords)
54
55 def _handle_memory_command(self, message: str) -> str:
56 """Extract and store information."""
57 # Use Claude to extract the fact
58 response = self.client.messages.create(
59 model=self.model,
60 max_tokens=256,
61 system="Extract the key fact the user wants you to remember. Return only the fact as a clear, concise statement.",
62 messages=[{"role": "user", "content": message}]
63 )
64
65 fact = response.content[0].text
66 self.knowledge.add_fact(fact)
67
68 return f"Got it! I'll remember that {fact.lower()}"
69
70 def _build_system_prompt(self, relevant_facts: list) -> str:
71 """Build system prompt with retrieved knowledge."""
72 base_prompt = "You are a helpful personal assistant."
73
74 if not relevant_facts:
75 return base_prompt
76
77 facts_text = "\n".join(f"- {fact}" for fact in relevant_facts)
78 return f"""{base_prompt}
79
80You have access to the following information about the user:
81
82{facts_text}
83
84Use this information when relevant to provide personalized responses."""
85
86 def start_new_conversation(self):
87 """Clear conversation history but keep long-term knowledge."""
88 self.conversation.clear()Let's see this in action:
1## Create the assistant
2assistant = PersonalAssistant(api_key="YOUR_ANTHROPIC_API_KEY")
3
4## Store some facts
5print(assistant.chat("Remember that my birthday is July 20"))
6## Output: Got it! I'll remember that your birthday is july 20
7
8print(assistant.chat("Remember that I'm allergic to peanuts"))
9## Output: Got it! I'll remember that you're allergic to peanuts
10
11print(assistant.chat("Remember that I prefer Italian food"))
12## Output: Got it! I'll remember that you prefer italian food
13
14## Now ask questions that use this knowledge
15print(assistant.chat("What should I be careful about when eating out?"))
16## Output: Based on what I know, you should be careful about peanuts since
17## you're allergic to them. Always inform restaurant staff about your allergy
18## and ask about ingredients, especially in sauces and desserts where peanuts
19## might be hidden...
20
21print(assistant.chat("Suggest a restaurant for my birthday dinner"))
22## Output: For your birthday on July 20, I'd suggest an Italian restaurant
23## since you prefer Italian food. Make sure to mention your peanut allergy
24## when making the reservation...1## Create the assistant
2assistant = PersonalAssistant(api_key="YOUR_ANTHROPIC_API_KEY")
3
4## Store some facts
5print(assistant.chat("Remember that my birthday is July 20"))
6## Output: Got it! I'll remember that your birthday is july 20
7
8print(assistant.chat("Remember that I'm allergic to peanuts"))
9## Output: Got it! I'll remember that you're allergic to peanuts
10
11print(assistant.chat("Remember that I prefer Italian food"))
12## Output: Got it! I'll remember that you prefer italian food
13
14## Now ask questions that use this knowledge
15print(assistant.chat("What should I be careful about when eating out?"))
16## Output: Based on what I know, you should be careful about peanuts since
17## you're allergic to them. Always inform restaurant staff about your allergy
18## and ask about ingredients, especially in sauces and desserts where peanuts
19## might be hidden...
20
21print(assistant.chat("Suggest a restaurant for my birthday dinner"))
22## Output: For your birthday on July 20, I'd suggest an Italian restaurant
23## since you prefer Italian food. Make sure to mention your peanut allergy
24## when making the reservation...Notice how the assistant:
- Stores facts when you use memory keywords
- Retrieves relevant facts when answering questions
- Combines stored knowledge with its language model capabilities
- Maintains conversation context across multiple turns
This is a complete, working memory system. But we can make it better.
Improving Retrieval with Semantic Search
The keyword search works, but it misses semantically related information. If you ask "What foods should I avoid?", it won't find your peanut allergy unless the word "avoid" appears in the stored fact.
Let's upgrade to semantic search using embeddings:
1from sentence_transformers import SentenceTransformer
2import numpy as np
3
4class SemanticKnowledgeStore:
5 """
6 Knowledge store with semantic search using embeddings.
7 """
8
9 def __init__(self, storage_file: str = "knowledge.json"):
10 self.storage_file = storage_file
11 # Using a local embedding model to keep costs down
12 self.encoder = SentenceTransformer('all-MiniLM-L6-v2')
13 self.facts = []
14 self.embeddings = []
15 self._load_data()
16
17 def _load_data(self):
18 """Load facts and regenerate embeddings."""
19 if os.path.exists(self.storage_file):
20 with open(self.storage_file, 'r') as f:
21 data = json.load(f)
22 self.facts = data
23 # Regenerate embeddings
24 for entry in self.facts:
25 embedding = self.encoder.encode(entry["fact"])
26 self.embeddings.append(embedding)
27
28 def _save_data(self):
29 """Save facts to disk."""
30 with open(self.storage_file, 'w') as f:
31 json.dump(self.facts, f, indent=2)
32
33 def add_fact(self, fact: str, category: str = None):
34 """Store a fact with its embedding."""
35 entry = {
36 "fact": fact,
37 "category": category,
38 "timestamp": datetime.now().isoformat()
39 }
40
41 # Generate embedding
42 embedding = self.encoder.encode(fact)
43
44 self.facts.append(entry)
45 self.embeddings.append(embedding)
46 self._save_data()
47
48 def search(self, query: str, top_k: int = 3, threshold: float = 0.3) -> list:
49 """
50 Search for facts semantically similar to the query.
51
52 Args:
53 query: The search query
54 top_k: Maximum number of results to return
55 threshold: Minimum similarity score (0-1)
56
57 Returns:
58 List of relevant facts
59 """
60 if not self.facts:
61 return []
62
63 # Encode query
64 query_embedding = self.encoder.encode(query)
65
66 # Calculate similarities
67 similarities = []
68 for i, fact_embedding in enumerate(self.embeddings):
69 # Cosine similarity
70 similarity = np.dot(query_embedding, fact_embedding) / (
71 np.linalg.norm(query_embedding) * np.linalg.norm(fact_embedding)
72 )
73 similarities.append((similarity, self.facts[i]["fact"]))
74
75 # Sort by similarity and filter by threshold
76 similarities.sort(reverse=True, key=lambda x: x[0])
77 results = [fact for score, fact in similarities[:top_k] if score >= threshold]
78
79 return results1from sentence_transformers import SentenceTransformer
2import numpy as np
3
4class SemanticKnowledgeStore:
5 """
6 Knowledge store with semantic search using embeddings.
7 """
8
9 def __init__(self, storage_file: str = "knowledge.json"):
10 self.storage_file = storage_file
11 # Using a local embedding model to keep costs down
12 self.encoder = SentenceTransformer('all-MiniLM-L6-v2')
13 self.facts = []
14 self.embeddings = []
15 self._load_data()
16
17 def _load_data(self):
18 """Load facts and regenerate embeddings."""
19 if os.path.exists(self.storage_file):
20 with open(self.storage_file, 'r') as f:
21 data = json.load(f)
22 self.facts = data
23 # Regenerate embeddings
24 for entry in self.facts:
25 embedding = self.encoder.encode(entry["fact"])
26 self.embeddings.append(embedding)
27
28 def _save_data(self):
29 """Save facts to disk."""
30 with open(self.storage_file, 'w') as f:
31 json.dump(self.facts, f, indent=2)
32
33 def add_fact(self, fact: str, category: str = None):
34 """Store a fact with its embedding."""
35 entry = {
36 "fact": fact,
37 "category": category,
38 "timestamp": datetime.now().isoformat()
39 }
40
41 # Generate embedding
42 embedding = self.encoder.encode(fact)
43
44 self.facts.append(entry)
45 self.embeddings.append(embedding)
46 self._save_data()
47
48 def search(self, query: str, top_k: int = 3, threshold: float = 0.3) -> list:
49 """
50 Search for facts semantically similar to the query.
51
52 Args:
53 query: The search query
54 top_k: Maximum number of results to return
55 threshold: Minimum similarity score (0-1)
56
57 Returns:
58 List of relevant facts
59 """
60 if not self.facts:
61 return []
62
63 # Encode query
64 query_embedding = self.encoder.encode(query)
65
66 # Calculate similarities
67 similarities = []
68 for i, fact_embedding in enumerate(self.embeddings):
69 # Cosine similarity
70 similarity = np.dot(query_embedding, fact_embedding) / (
71 np.linalg.norm(query_embedding) * np.linalg.norm(fact_embedding)
72 )
73 similarities.append((similarity, self.facts[i]["fact"]))
74
75 # Sort by similarity and filter by threshold
76 similarities.sort(reverse=True, key=lambda x: x[0])
77 results = [fact for score, fact in similarities[:top_k] if score >= threshold]
78
79 return resultsNow replace KnowledgeStore with SemanticKnowledgeStore in the PersonalAssistant class:
1class PersonalAssistant:
2 def __init__(self, api_key: str):
3 self.client = anthropic.Anthropic(api_key=api_key)
4 self.model = "claude-sonnet-4.5"
5 self.conversation = ConversationManager(max_messages=20)
6 # Use semantic search instead
7 self.knowledge = SemanticKnowledgeStore()
8
9 # ... rest of the methods stay the same1class PersonalAssistant:
2 def __init__(self, api_key: str):
3 self.client = anthropic.Anthropic(api_key=api_key)
4 self.model = "claude-sonnet-4.5"
5 self.conversation = ConversationManager(max_messages=20)
6 # Use semantic search instead
7 self.knowledge = SemanticKnowledgeStore()
8
9 # ... rest of the methods stay the sameLet's test the improved search:
1assistant = PersonalAssistant(api_key="YOUR_ANTHROPIC_API_KEY")
2
3## Store facts
4assistant.chat("Remember that I'm allergic to peanuts")
5assistant.chat("Remember that I prefer Italian food")
6assistant.chat("Remember that I live in San Francisco")
7
8## Semantic queries that don't match keywords exactly
9print(assistant.chat("What foods should I avoid?"))
10## Finds: "You're allergic to peanuts"
11## Output: You should avoid peanuts and any foods containing peanuts or
12## peanut oil, as you're allergic to them...
13
14print(assistant.chat("Where am I located?"))
15## Finds: "You live in San Francisco"
16## Output: You're located in San Francisco.
17
18print(assistant.chat("What cuisine do I enjoy?"))
19## Finds: "You prefer Italian food"
20## Output: You enjoy Italian cuisine.1assistant = PersonalAssistant(api_key="YOUR_ANTHROPIC_API_KEY")
2
3## Store facts
4assistant.chat("Remember that I'm allergic to peanuts")
5assistant.chat("Remember that I prefer Italian food")
6assistant.chat("Remember that I live in San Francisco")
7
8## Semantic queries that don't match keywords exactly
9print(assistant.chat("What foods should I avoid?"))
10## Finds: "You're allergic to peanuts"
11## Output: You should avoid peanuts and any foods containing peanuts or
12## peanut oil, as you're allergic to them...
13
14print(assistant.chat("Where am I located?"))
15## Finds: "You live in San Francisco"
16## Output: You're located in San Francisco.
17
18print(assistant.chat("What cuisine do I enjoy?"))
19## Finds: "You prefer Italian food"
20## Output: You enjoy Italian cuisine.The semantic search understands meaning, not just keywords. "What foods should I avoid?" finds your allergy information even though "avoid" doesn't appear in the stored fact.
Adding Tool Use with Memory
Let's extend our assistant to use tools while maintaining memory. We'll add a calculator tool as an example:
1def calculate(expression: str) -> dict:
2 """Calculator tool for mathematical operations."""
3 try:
4 result = eval(expression)
5 return {"success": True, "result": result}
6 except Exception as e:
7 return {"success": False, "error": str(e)}
8
9calculator_tool = {
10 "name": "calculate",
11 "description": "Perform mathematical calculations. Input should be a valid Python expression.",
12 "input_schema": {
13 "type": "object",
14 "properties": {
15 "expression": {
16 "type": "string",
17 "description": "Mathematical expression to evaluate (e.g., '2 + 2', '15 * 3.5')"
18 }
19 },
20 "required": ["expression"]
21 }
22}
23
24class PersonalAssistantWithTools:
25 """Assistant with memory and tool use."""
26
27 def __init__(self, api_key: str):
28 self.client = anthropic.Anthropic(api_key=api_key)
29 self.model = "claude-sonnet-4.5"
30 self.conversation = ConversationManager(max_messages=20)
31 self.knowledge = SemanticKnowledgeStore()
32 self.tools = {"calculate": calculate}
33
34 def chat(self, user_message: str) -> str:
35 """Process message with tool use support."""
36 # Handle memory commands
37 if self._is_memory_command(user_message):
38 return self._handle_memory_command(user_message)
39
40 # Search for relevant facts
41 relevant_facts = self.knowledge.search(user_message)
42 system_prompt = self._build_system_prompt(relevant_facts)
43
44 # Add user message
45 self.conversation.add_user_message(user_message)
46
47 # Get response with tools
48 response = self.client.messages.create(
49 model=self.model,
50 max_tokens=1024,
51 system=system_prompt,
52 tools=[calculator_tool],
53 messages=self.conversation.get_messages()
54 )
55
56 # Handle tool use if needed
57 while response.stop_reason == "tool_use":
58 # Extract tool call
59 tool_use_block = next(
60 block for block in response.content
61 if block.type == "tool_use"
62 )
63
64 # Execute tool
65 tool_result = self.tools[tool_use_block.name](
66 **tool_use_block.input
67 )
68
69 # Add tool use to conversation
70 self.conversation.add_assistant_message(response.content)
71
72 # Add tool result
73 self.conversation.messages.append({
74 "role": "user",
75 "content": [{
76 "type": "tool_result",
77 "tool_use_id": tool_use_block.id,
78 "content": str(tool_result)
79 }]
80 })
81
82 # Get next response
83 response = self.client.messages.create(
84 model=self.model,
85 max_tokens=1024,
86 system=system_prompt,
87 tools=[calculator_tool],
88 messages=self.conversation.get_messages()
89 )
90
91 # Extract final response
92 assistant_message = response.content[0].text
93 self.conversation.add_assistant_message(assistant_message)
94
95 return assistant_message
96
97 # ... other methods same as before1def calculate(expression: str) -> dict:
2 """Calculator tool for mathematical operations."""
3 try:
4 result = eval(expression)
5 return {"success": True, "result": result}
6 except Exception as e:
7 return {"success": False, "error": str(e)}
8
9calculator_tool = {
10 "name": "calculate",
11 "description": "Perform mathematical calculations. Input should be a valid Python expression.",
12 "input_schema": {
13 "type": "object",
14 "properties": {
15 "expression": {
16 "type": "string",
17 "description": "Mathematical expression to evaluate (e.g., '2 + 2', '15 * 3.5')"
18 }
19 },
20 "required": ["expression"]
21 }
22}
23
24class PersonalAssistantWithTools:
25 """Assistant with memory and tool use."""
26
27 def __init__(self, api_key: str):
28 self.client = anthropic.Anthropic(api_key=api_key)
29 self.model = "claude-sonnet-4.5"
30 self.conversation = ConversationManager(max_messages=20)
31 self.knowledge = SemanticKnowledgeStore()
32 self.tools = {"calculate": calculate}
33
34 def chat(self, user_message: str) -> str:
35 """Process message with tool use support."""
36 # Handle memory commands
37 if self._is_memory_command(user_message):
38 return self._handle_memory_command(user_message)
39
40 # Search for relevant facts
41 relevant_facts = self.knowledge.search(user_message)
42 system_prompt = self._build_system_prompt(relevant_facts)
43
44 # Add user message
45 self.conversation.add_user_message(user_message)
46
47 # Get response with tools
48 response = self.client.messages.create(
49 model=self.model,
50 max_tokens=1024,
51 system=system_prompt,
52 tools=[calculator_tool],
53 messages=self.conversation.get_messages()
54 )
55
56 # Handle tool use if needed
57 while response.stop_reason == "tool_use":
58 # Extract tool call
59 tool_use_block = next(
60 block for block in response.content
61 if block.type == "tool_use"
62 )
63
64 # Execute tool
65 tool_result = self.tools[tool_use_block.name](
66 **tool_use_block.input
67 )
68
69 # Add tool use to conversation
70 self.conversation.add_assistant_message(response.content)
71
72 # Add tool result
73 self.conversation.messages.append({
74 "role": "user",
75 "content": [{
76 "type": "tool_result",
77 "tool_use_id": tool_use_block.id,
78 "content": str(tool_result)
79 }]
80 })
81
82 # Get next response
83 response = self.client.messages.create(
84 model=self.model,
85 max_tokens=1024,
86 system=system_prompt,
87 tools=[calculator_tool],
88 messages=self.conversation.get_messages()
89 )
90
91 # Extract final response
92 assistant_message = response.content[0].text
93 self.conversation.add_assistant_message(assistant_message)
94
95 return assistant_message
96
97 # ... other methods same as beforeNow watch memory and tools work together:
1assistant = PersonalAssistantWithTools(api_key="YOUR_ANTHROPIC_API_KEY")
2
3## Store a fact
4print(assistant.chat("Remember that I need to save \$500 per month"))
5## Output: Got it! I'll remember that you need to save $500 per month
6
7## Use tools with memory
8print(assistant.chat("I earn \$3000 per month. After my savings goal, how much do I have left?"))
9## Agent retrieves: "You need to save $500 per month"
10## Agent uses calculator: 3000 - 500
11## Output: After setting aside your $500 monthly savings goal, you'll have
12## $2,500 left for other expenses.
13
14print(assistant.chat("If I split the remaining amount across 4 weeks, how much is that per week?"))
15## Agent remembers previous calculation: $2,500
16## Agent uses calculator: 2500 / 4
17## Output: Splitting your remaining $2,500 across 4 weeks gives you $625 per week.1assistant = PersonalAssistantWithTools(api_key="YOUR_ANTHROPIC_API_KEY")
2
3## Store a fact
4print(assistant.chat("Remember that I need to save \$500 per month"))
5## Output: Got it! I'll remember that you need to save $500 per month
6
7## Use tools with memory
8print(assistant.chat("I earn \$3000 per month. After my savings goal, how much do I have left?"))
9## Agent retrieves: "You need to save $500 per month"
10## Agent uses calculator: 3000 - 500
11## Output: After setting aside your $500 monthly savings goal, you'll have
12## $2,500 left for other expenses.
13
14print(assistant.chat("If I split the remaining amount across 4 weeks, how much is that per week?"))
15## Agent remembers previous calculation: $2,500
16## Agent uses calculator: 2500 / 4
17## Output: Splitting your remaining $2,500 across 4 weeks gives you $625 per week.The assistant combines three capabilities:
- Long-term memory: Retrieves your savings goal
- Short-term memory: Remembers the previous calculation
- Tool use: Performs accurate calculations
This is powerful. The agent can reference stored facts, maintain conversation context, and use tools to solve problems it couldn't handle with language alone.
Managing Memory Over Time
As your assistant accumulates facts, you'll need ways to manage them. Let's add some utilities:
1class PersonalAssistantWithManagement(PersonalAssistantWithTools):
2 """Assistant with memory management capabilities."""
3
4 def list_facts(self) -> str:
5 """Show all stored facts."""
6 facts = self.knowledge.get_all_facts()
7
8 if not facts:
9 return "I don't have any facts stored yet."
10
11 facts_list = "\n".join(f"{i+1}. {fact}" for i, fact in enumerate(facts))
12 return f"Here's what I know about you:\n\n{facts_list}"
13
14 def forget_fact(self, fact_number: int) -> str:
15 """Remove a specific fact."""
16 facts = self.knowledge.facts
17
18 if fact_number < 1 or fact_number > len(facts):
19 return f"Invalid fact number. I have {len(facts)} facts stored."
20
21 removed_fact = facts[fact_number - 1]["fact"]
22
23 # Remove from both lists
24 facts.pop(fact_number - 1)
25 self.knowledge.embeddings.pop(fact_number - 1)
26 self.knowledge._save_data()
27
28 return f"Forgot: {removed_fact}"
29
30 def update_fact(self, fact_number: int, new_fact: str) -> str:
31 """Update an existing fact."""
32 facts = self.knowledge.facts
33
34 if fact_number < 1 or fact_number > len(facts):
35 return f"Invalid fact number. I have {len(facts)} facts stored."
36
37 old_fact = facts[fact_number - 1]["fact"]
38
39 # Update fact and embedding
40 facts[fact_number - 1]["fact"] = new_fact
41 facts[fact_number - 1]["timestamp"] = datetime.now().isoformat()
42 self.knowledge.embeddings[fact_number - 1] = self.knowledge.encoder.encode(new_fact)
43 self.knowledge._save_data()
44
45 return f"Updated: '{old_fact}' → '{new_fact}'"1class PersonalAssistantWithManagement(PersonalAssistantWithTools):
2 """Assistant with memory management capabilities."""
3
4 def list_facts(self) -> str:
5 """Show all stored facts."""
6 facts = self.knowledge.get_all_facts()
7
8 if not facts:
9 return "I don't have any facts stored yet."
10
11 facts_list = "\n".join(f"{i+1}. {fact}" for i, fact in enumerate(facts))
12 return f"Here's what I know about you:\n\n{facts_list}"
13
14 def forget_fact(self, fact_number: int) -> str:
15 """Remove a specific fact."""
16 facts = self.knowledge.facts
17
18 if fact_number < 1 or fact_number > len(facts):
19 return f"Invalid fact number. I have {len(facts)} facts stored."
20
21 removed_fact = facts[fact_number - 1]["fact"]
22
23 # Remove from both lists
24 facts.pop(fact_number - 1)
25 self.knowledge.embeddings.pop(fact_number - 1)
26 self.knowledge._save_data()
27
28 return f"Forgot: {removed_fact}"
29
30 def update_fact(self, fact_number: int, new_fact: str) -> str:
31 """Update an existing fact."""
32 facts = self.knowledge.facts
33
34 if fact_number < 1 or fact_number > len(facts):
35 return f"Invalid fact number. I have {len(facts)} facts stored."
36
37 old_fact = facts[fact_number - 1]["fact"]
38
39 # Update fact and embedding
40 facts[fact_number - 1]["fact"] = new_fact
41 facts[fact_number - 1]["timestamp"] = datetime.now().isoformat()
42 self.knowledge.embeddings[fact_number - 1] = self.knowledge.encoder.encode(new_fact)
43 self.knowledge._save_data()
44
45 return f"Updated: '{old_fact}' → '{new_fact}'"Now you can manage stored knowledge:
1assistant = PersonalAssistantWithManagement(api_key="YOUR_ANTHROPIC_API_KEY")
2
3## Store some facts
4assistant.chat("Remember that I live in San Francisco")
5assistant.chat("Remember that I'm allergic to peanuts")
6assistant.chat("Remember that I prefer Italian food")
7
8## List all facts
9print(assistant.list_facts())
10## Output:
11## Here's what I know about you:
12##
13## 1. You live in San Francisco
14## 2. You're allergic to peanuts
15## 3. You prefer Italian food
16
17## Update a fact
18print(assistant.update_fact(1, "You live in Oakland"))
19## Output: Updated: 'You live in San Francisco' → 'You live in Oakland'
20
21## Remove a fact
22print(assistant.forget_fact(2))
23## Output: Forgot: You're allergic to peanuts1assistant = PersonalAssistantWithManagement(api_key="YOUR_ANTHROPIC_API_KEY")
2
3## Store some facts
4assistant.chat("Remember that I live in San Francisco")
5assistant.chat("Remember that I'm allergic to peanuts")
6assistant.chat("Remember that I prefer Italian food")
7
8## List all facts
9print(assistant.list_facts())
10## Output:
11## Here's what I know about you:
12##
13## 1. You live in San Francisco
14## 2. You're allergic to peanuts
15## 3. You prefer Italian food
16
17## Update a fact
18print(assistant.update_fact(1, "You live in Oakland"))
19## Output: Updated: 'You live in San Francisco' → 'You live in Oakland'
20
21## Remove a fact
22print(assistant.forget_fact(2))
23## Output: Forgot: You're allergic to peanutsThese management functions give users control over their data. This is important for privacy and accuracy.
Design Decisions and Trade-offs
Let's discuss the choices we made and their implications:
Sliding Window for Conversations
We limit conversation history to 20 messages. This:
Pros:
- Keeps costs predictable
- Prevents context overflow
- Simple to implement
Cons:
- Loses older conversation context
- Might forget important details from earlier in the session
Alternative: Use summarization instead of hard truncation. Periodically summarize old messages and keep the summary.
Keyword Detection for Memory Commands
We detect "remember", "store", etc. to trigger memory storage. This:
Pros:
- Simple and fast
- No extra API calls
- User has explicit control
Cons:
- Might miss implicit memory requests
- Requires specific keywords
Alternative: Use the LLM to classify every message as "store this" or "just chat". More flexible but costs more.
Semantic Search with Local Embeddings
We use a local embedding model for semantic search. This:
Pros:
- Fast and free
- Good enough for most use cases
- No API calls for every search
Cons:
- Less powerful than API-based embeddings
- Requires installing additional libraries
Alternative: Use OpenAI's embedding API or Anthropic's future embedding service. Better quality but costs money.
JSON File Storage
We store facts in a JSON file. This:
Pros:
- Simple to implement
- No database setup required
- Easy to inspect and debug
Cons:
- Doesn't scale to thousands of facts
- No concurrent access support
- Limited query capabilities
Alternative: Use a proper database (SQLite for local, PostgreSQL for production) or a vector database (Pinecone, Weaviate, Chroma).
When to Use What
Here's a practical guide for choosing memory strategies:
For personal projects or prototypes:
- Use the simple JSON-based approach we've shown
- Sliding window for conversation history
- Local embeddings for semantic search
For production applications with <1000 users:
- SQLite or PostgreSQL for knowledge storage
- Consider Redis for conversation history (fast, ephemeral)
- Still use local embeddings if cost is a concern
For large-scale applications:
- Vector database (Pinecone, Weaviate, Chroma) for knowledge
- Redis or similar for conversation state
- API-based embeddings for best quality
- Implement proper user isolation and security
Testing Your Memory System
How do you know if your memory system works well? Here are some tests:
1def test_memory_system():
2 """Test that memory works correctly."""
3 assistant = PersonalAssistant(api_key="YOUR_ANTHROPIC_API_KEY")
4
5 # Test 1: Store and retrieve
6 assistant.chat("Remember that my birthday is July 20")
7 response = assistant.chat("When is my birthday?")
8 assert "july 20" in response.lower(), "Failed to retrieve birthday"
9
10 # Test 2: Semantic search
11 assistant.chat("Remember that I'm allergic to peanuts")
12 response = assistant.chat("What should I avoid eating?")
13 assert "peanut" in response.lower(), "Failed semantic search"
14
15 # Test 3: Conversation context
16 assistant.chat("I'm planning a trip to Japan")
17 response = assistant.chat("What's the best time to visit?")
18 assert "japan" in response.lower(), "Lost conversation context"
19
20 # Test 4: Persistence
21 assistant2 = PersonalAssistant(api_key="YOUR_ANTHROPIC_API_KEY")
22 response = assistant2.chat("When is my birthday?")
23 assert "july 20" in response.lower(), "Facts not persisted"
24
25 print("All tests passed!")
26
27test_memory_system()1def test_memory_system():
2 """Test that memory works correctly."""
3 assistant = PersonalAssistant(api_key="YOUR_ANTHROPIC_API_KEY")
4
5 # Test 1: Store and retrieve
6 assistant.chat("Remember that my birthday is July 20")
7 response = assistant.chat("When is my birthday?")
8 assert "july 20" in response.lower(), "Failed to retrieve birthday"
9
10 # Test 2: Semantic search
11 assistant.chat("Remember that I'm allergic to peanuts")
12 response = assistant.chat("What should I avoid eating?")
13 assert "peanut" in response.lower(), "Failed semantic search"
14
15 # Test 3: Conversation context
16 assistant.chat("I'm planning a trip to Japan")
17 response = assistant.chat("What's the best time to visit?")
18 assert "japan" in response.lower(), "Lost conversation context"
19
20 # Test 4: Persistence
21 assistant2 = PersonalAssistant(api_key="YOUR_ANTHROPIC_API_KEY")
22 response = assistant2.chat("When is my birthday?")
23 assert "july 20" in response.lower(), "Facts not persisted"
24
25 print("All tests passed!")
26
27test_memory_system()These tests verify:
- Facts are stored and retrieved correctly
- Semantic search finds relevant information
- Conversation context is maintained
- Data persists across sessions
What You've Built
Let's appreciate what you now have:
A complete memory system with both short-term and long-term components working together.
Semantic search that understands meaning, not just keywords.
Tool integration where memory enhances tool use.
Memory management giving users control over their data.
Persistent storage that survives between sessions.
This is a real, working personal assistant. You can extend it with more tools, better storage, or additional capabilities. The foundation is solid.
Practical Considerations
As you deploy your memory system, keep these points in mind:
Privacy matters: You're storing personal information. Encrypt sensitive data, provide ways to export or delete it, and be transparent about what you store.
Memory can be wrong: Users might tell you incorrect information or change their minds. Provide ways to correct or update facts.
Not everything should be remembered: Some information is temporary or sensitive. Consider what truly needs long-term storage.
Test with real users: Your assumptions about what to remember might differ from what users actually need. Gather feedback.
Monitor costs: If using API-based embeddings or large context windows, track your spending. Optimize where needed.
Key Takeaways
Let's review what we've learned about implementing memory:
Memory has two layers: Short-term for conversations, long-term for persistent facts. Both are essential.
Start simple: A list for conversations and a JSON file for facts works fine for many applications.
Semantic search is powerful: Embeddings let you find information by meaning, making retrieval much more useful.
Memory enhances everything: When combined with tools and reasoning, memory makes your agent far more capable.
Design for users: Provide ways to view, update, and delete stored information. Users should control their data.
Your personal assistant now has a complete memory system. It remembers conversations, stores important facts, retrieves relevant information, and combines all of this with language model capabilities and tool use. This is a significant milestone in building truly useful AI agents.
In the next chapter, we'll explore how to organize all these components into a coherent agent architecture, showing how memory, reasoning, tools, and state management work together in a unified system.
Glossary
Conversation Manager: A component that handles short-term memory by storing and managing the recent message history in a conversation.
Knowledge Store: A persistent storage system for long-term facts and information that survives across sessions.
Semantic Search: Finding information based on meaning rather than exact keyword matches, typically using embeddings and similarity calculations.
Sliding Window: A memory management strategy that keeps only the most recent N messages, automatically discarding older ones.
Embedding: A numerical vector representation of text that captures its semantic meaning, enabling similarity-based search.
Tool Integration: The ability for an agent to use external functions or APIs while maintaining memory of both the conversation and stored knowledge.
Memory Management: Features that let users view, update, or delete stored information, giving them control over their data.
Quiz
Ready to test your understanding? Take this quick quiz to reinforce what you've learned about implementing memory in AI agents.
Reference

About the author: Michael Brenndoerfer
All opinions expressed here are my own and do not reflect the views of my employer.
Michael currently works as an Associate Director of Data Science at EQT Partners in Singapore, where he drives AI and data initiatives across private capital investments.
With over a decade of experience spanning private equity, management consulting, and software engineering, he specializes in building and scaling analytics capabilities from the ground up. He has published research in leading AI conferences and holds expertise in machine learning, natural language processing, and value creation through data.
Related Content

Scaling Up without Breaking the Bank: AI Agent Performance & Cost Optimization at Scale
Learn how to scale AI agents from single users to thousands while maintaining performance and controlling costs. Covers horizontal scaling, load balancing, monitoring, cost controls, and prompt optimization strategies.

Managing and Reducing AI Agent Costs: Complete Guide to Cost Optimization Strategies
Learn how to dramatically reduce AI agent API costs without sacrificing capability. Covers model selection, caching, batching, prompt optimization, and budget controls with practical Python examples.

Speeding Up AI Agents: Performance Optimization Techniques for Faster Response Times
Learn practical techniques to make AI agents respond faster, including model selection strategies, response caching, streaming, parallel execution, and prompt optimization for reduced latency.
Stay updated
Get notified when I publish new articles on data and AI, private equity, technology, and more.

