Learn how to add logging to AI agents to debug behavior, track decisions, and monitor tool usage. Includes practical Python examples with structured logging patterns and best practices.

This article is part of the free-to-read AI Agent Handbook
Adding Logs to the Agent
When you build an AI agent, you're creating something that makes decisions on its own. It decides when to use tools, how to reason through problems, and what information to retrieve from memory. This autonomy is powerful, but it also means you can't always predict what the agent will do. Sometimes it works perfectly. Other times, it gives an unexpected answer or calls the wrong tool. When that happens, you need a way to understand what went wrong.
This is where logging comes in. By adding logs at key decision points in your agent's code, you create a trail of breadcrumbs showing exactly what the agent did and why. Think of it like keeping a lab notebook during an experiment. Without notes, you won't remember what worked and what didn't. With good notes, you can trace back through every step and spot the problem.
Let's see how to add logging to our personal assistant so we can peer inside its decision-making process.
Why Agents Need Logs
Our assistant has grown considerably since we started building it. It now uses tools, maintains memory, plans multi-step tasks, and reasons through complex problems. Each of these capabilities involves the agent making choices:
- Should I use the calculator tool or try to answer this math question directly?
- Which pieces of conversation history are relevant to include in my context?
- What's the first step in this multi-step plan?
- Do I have enough information to answer, or do I need to call another tool?
When your agent makes the right choice, everything works smoothly. But when it makes the wrong choice, you need to know which decision went wrong and why. Without logs, debugging feels like detective work with no clues. You see the final wrong answer, but you don't know whether the agent:
- Used the wrong tool
- Failed to retrieve relevant information from memory
- Misunderstood the user's request
- Made an error in its reasoning chain
Logs solve this problem by recording what happens at each step. They turn your agent from a black box into something you can observe and understand.
What to Log
You don't need to log everything your agent does. Too many logs become noise, making it harder to find useful information. Instead, focus on logging the key decision points and transitions in your agent's flow.
Here are the most valuable things to log:
User Input: Record what the user asked for. This gives you the starting point for tracing through what the agent did.
1## Using Claude Sonnet 4.5 for its superior agent reasoning capabilities
2import anthropic
3import logging
4
5## Set up basic logging
6logging.basicConfig(level=logging.INFO)
7logger = logging.getLogger(__name__)
8
9def process_user_query(user_input):
10 logger.info(f"Received user query: {user_input}")
11 # Process the query...1## Using Claude Sonnet 4.5 for its superior agent reasoning capabilities
2import anthropic
3import logging
4
5## Set up basic logging
6logging.basicConfig(level=logging.INFO)
7logger = logging.getLogger(__name__)
8
9def process_user_query(user_input):
10 logger.info(f"Received user query: {user_input}")
11 # Process the query...Tool Decisions: Log when the agent decides to use a tool and which tool it chose. This helps you verify the agent is selecting the right tool for each task.
1def decide_tool_usage(query, available_tools):
2 # Agent decides which tool to use
3 selected_tool = agent_select_tool(query, available_tools)
4 logger.info(f"Agent selected tool: {selected_tool} for query: {query}")
5 return selected_tool1def decide_tool_usage(query, available_tools):
2 # Agent decides which tool to use
3 selected_tool = agent_select_tool(query, available_tools)
4 logger.info(f"Agent selected tool: {selected_tool} for query: {query}")
5 return selected_toolTool Calls and Results: Record both the input to each tool and what the tool returned. If a tool fails or returns unexpected data, you'll want to know.
1def call_calculator(expression):
2 logger.info(f"Calling calculator with expression: {expression}")
3 try:
4 result = eval(expression) # Simplified for example
5 logger.info(f"Calculator returned: {result}")
6 return result
7 except Exception as e:
8 logger.error(f"Calculator failed: {e}")
9 return None1def call_calculator(expression):
2 logger.info(f"Calling calculator with expression: {expression}")
3 try:
4 result = eval(expression) # Simplified for example
5 logger.info(f"Calculator returned: {result}")
6 return result
7 except Exception as e:
8 logger.error(f"Calculator failed: {e}")
9 return NoneReasoning Steps: When your agent uses chain-of-thought or other reasoning techniques, log the intermediate thoughts. This lets you follow the agent's logic.
1def reason_through_problem(problem):
2 logger.info(f"Starting reasoning for problem: {problem}")
3 # Agent generates reasoning steps
4 steps = generate_reasoning_steps(problem)
5 for i, step in enumerate(steps):
6 logger.info(f"Reasoning step {i+1}: {step}")
7 return steps1def reason_through_problem(problem):
2 logger.info(f"Starting reasoning for problem: {problem}")
3 # Agent generates reasoning steps
4 steps = generate_reasoning_steps(problem)
5 for i, step in enumerate(steps):
6 logger.info(f"Reasoning step {i+1}: {step}")
7 return stepsMemory Operations: Log when the agent retrieves information from memory or stores something new. This helps you verify the agent is using its memory correctly.
1def retrieve_from_memory(query):
2 logger.info(f"Searching memory for: {query}")
3 results = memory_search(query)
4 logger.info(f"Found {len(results)} relevant items in memory")
5 return results1def retrieve_from_memory(query):
2 logger.info(f"Searching memory for: {query}")
3 results = memory_search(query)
4 logger.info(f"Found {len(results)} relevant items in memory")
5 return resultsFinal Response: Log the agent's final answer to the user. Combined with the initial query log, this gives you the complete input-output pair.
1def send_response(response):
2 logger.info(f"Sending response to user: {response}")
3 return response1def send_response(response):
2 logger.info(f"Sending response to user: {response}")
3 return responseAdding Logs to Our Assistant
Let's take a simplified version of our personal assistant and add logging at the key points. We'll start by setting up the logging configuration.
1## Example (Claude Sonnet 4.5)
2import anthropic
3import logging
4
5## Configure logging with a clear format
6logging.basicConfig(
7 level=logging.INFO,
8 format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
9)
10logger = logging.getLogger(__name__)1## Example (Claude Sonnet 4.5)
2import anthropic
3import logging
4
5## Configure logging with a clear format
6logging.basicConfig(
7 level=logging.INFO,
8 format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
9)
10logger = logging.getLogger(__name__)Now let's look at the main query processing function with logging at each decision point:
1def process_query(user_input):
2 # Log the incoming query
3 logger.info(f"User query: {user_input}")
4
5 # Check if we need tools
6 needs_tool = check_tool_requirement(user_input)
7 logger.info(f"Tool required: {needs_tool}")
8
9 if needs_tool:
10 tool_name = select_tool(user_input)
11 logger.info(f"Selected tool: {tool_name}")
12
13 tool_result = use_tool(tool_name, user_input)
14 logger.info(f"Tool result: {tool_result}")
15
16 response = generate_response_with_tool(user_input, tool_result)
17 else:
18 logger.info("Generating direct response (no tool)")
19 response = generate_direct_response(user_input)
20
21 # Log final response
22 logger.info(f"Assistant response: {response}")
23 return response1def process_query(user_input):
2 # Log the incoming query
3 logger.info(f"User query: {user_input}")
4
5 # Check if we need tools
6 needs_tool = check_tool_requirement(user_input)
7 logger.info(f"Tool required: {needs_tool}")
8
9 if needs_tool:
10 tool_name = select_tool(user_input)
11 logger.info(f"Selected tool: {tool_name}")
12
13 tool_result = use_tool(tool_name, user_input)
14 logger.info(f"Tool result: {tool_result}")
15
16 response = generate_response_with_tool(user_input, tool_result)
17 else:
18 logger.info("Generating direct response (no tool)")
19 response = generate_direct_response(user_input)
20
21 # Log final response
22 logger.info(f"Assistant response: {response}")
23 return responseNotice how each major decision point has a log statement. When you run this agent and something goes wrong, you can trace through the logs to see exactly what path the agent took.
Log Levels for Different Information
Python's logging module provides different levels for different types of information. Using the right level helps you filter logs based on what you're looking for.
INFO: Normal operation milestones. Use this for standard agent actions like "received query", "selected tool", "generated response".
DEBUG: Detailed information useful during development. Use this for verbose details like the full content of prompts or intermediate data structures.
WARNING: Something unexpected happened, but the agent recovered. Use this for cases like "tool call failed, retrying" or "no relevant memory found".
ERROR: Something went wrong that prevented normal operation. Use this when a tool crashes or the agent can't complete a request.
Here's how you might use different levels:
1def use_calculator(expression):
2 logger.debug(f"Calculator input (full): {expression}")
3
4 try:
5 result = eval(expression)
6 logger.info(f"Calculator computed: {result}")
7 return result
8 except SyntaxError as e:
9 logger.warning(f"Invalid expression: {expression}, error: {e}")
10 return "I couldn't understand that mathematical expression"
11 except Exception as e:
12 logger.error(f"Calculator failed unexpectedly: {e}")
13 return None1def use_calculator(expression):
2 logger.debug(f"Calculator input (full): {expression}")
3
4 try:
5 result = eval(expression)
6 logger.info(f"Calculator computed: {result}")
7 return result
8 except SyntaxError as e:
9 logger.warning(f"Invalid expression: {expression}, error: {e}")
10 return "I couldn't understand that mathematical expression"
11 except Exception as e:
12 logger.error(f"Calculator failed unexpectedly: {e}")
13 return NoneWith log levels, you can control how much detail you see. During development, set the level to DEBUG to see everything. In production, set it to INFO or WARNING to reduce noise.
Making Logs Useful
Raw print statements can work for quick debugging, but structured logging makes your logs much more useful, especially when you're handling many requests or running the agent in production.
Include Timestamps: Knowing when each action happened helps you understand timing issues and correlate logs with user reports. The logging module adds timestamps automatically when you configure the format:
1logging.basicConfig(
2 format='%(asctime)s - %(levelname)s - %(message)s'
3)1logging.basicConfig(
2 format='%(asctime)s - %(levelname)s - %(message)s'
3)Add Context: Include relevant identifiers in your logs so you can track a single request through the entire system. If your agent handles multiple users, log the user ID. If it processes multiple requests, log a request ID.
1class PersonalAssistant:
2 def __init__(self, api_key, user_id):
3 self.user_id = user_id
4 self.logger = logging.getLogger(f"assistant.{user_id}")
5
6 def process_query(self, user_input, request_id):
7 self.logger.info(
8 f"[Request {request_id}] Processing: {user_input}"
9 )1class PersonalAssistant:
2 def __init__(self, api_key, user_id):
3 self.user_id = user_id
4 self.logger = logging.getLogger(f"assistant.{user_id}")
5
6 def process_query(self, user_input, request_id):
7 self.logger.info(
8 f"[Request {request_id}] Processing: {user_input}"
9 )Structure Your Messages: Use a consistent format for logs of the same type. This makes it easier to search through logs or process them automatically.
1## Good: Consistent structure
2logger.info(f"Tool called: {tool_name}, input: {input_data}")
3logger.info(f"Tool completed: {tool_name}, result: {result}")
4
5## Less useful: Inconsistent structure
6logger.info(f"Using {tool_name}")
7logger.info(f"Got back: {result}")1## Good: Consistent structure
2logger.info(f"Tool called: {tool_name}, input: {input_data}")
3logger.info(f"Tool completed: {tool_name}, result: {result}")
4
5## Less useful: Inconsistent structure
6logger.info(f"Using {tool_name}")
7logger.info(f"Got back: {result}")Be Selective About Sensitive Data: Don't log sensitive information like passwords, API keys, or personal user data. If you need to log something that might contain sensitive data, redact it first.
1def log_user_query(query):
2 # Redact emails from logs
3 safe_query = query.replace(
4 r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b',
5 '[EMAIL]'
6 )
7 logger.info(f"User query: {safe_query}")1def log_user_query(query):
2 # Redact emails from logs
3 safe_query = query.replace(
4 r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b',
5 '[EMAIL]'
6 )
7 logger.info(f"User query: {safe_query}")Seeing Logs in Action
Let's look at what logs from our assistant might look like during a typical interaction. This gives you a sense of how logs help you understand what's happening.
The user asks: "What's 234 times 567?"
12025-11-10 14:32:15 - assistant - INFO - User query: What's 234 times 567?
22025-11-10 14:32:15 - assistant - INFO - Tool required: True
32025-11-10 14:32:15 - assistant - INFO - Selected tool: calculator
42025-11-10 14:32:15 - assistant - INFO - Calling calculator with: 234 * 567
52025-11-10 14:32:15 - assistant - INFO - Calculator returned: 132678
62025-11-10 14:32:16 - assistant - INFO - Assistant response: The answer is 132,678.12025-11-10 14:32:15 - assistant - INFO - User query: What's 234 times 567?
22025-11-10 14:32:15 - assistant - INFO - Tool required: True
32025-11-10 14:32:15 - assistant - INFO - Selected tool: calculator
42025-11-10 14:32:15 - assistant - INFO - Calling calculator with: 234 * 567
52025-11-10 14:32:15 - assistant - INFO - Calculator returned: 132678
62025-11-10 14:32:16 - assistant - INFO - Assistant response: The answer is 132,678.These logs tell a clear story. The agent received a math question, recognized it needed the calculator tool, called the calculator with the correct expression, got the result, and sent the answer to the user. Everything worked as expected.
Now imagine the user asks: "What's the weather like?"
12025-11-10 14:35:22 - assistant - INFO - User query: What's the weather like?
22025-11-10 14:35:22 - assistant - INFO - Tool required: True
32025-11-10 14:35:22 - assistant - INFO - Selected tool: weather_api
42025-11-10 14:35:22 - assistant - WARNING - No location specified, using default
52025-11-10 14:35:23 - assistant - INFO - Weather API returned: Sunny, 72°F
62025-11-10 14:35:23 - assistant - INFO - Assistant response: It's sunny and 72°F.12025-11-10 14:35:22 - assistant - INFO - User query: What's the weather like?
22025-11-10 14:35:22 - assistant - INFO - Tool required: True
32025-11-10 14:35:22 - assistant - INFO - Selected tool: weather_api
42025-11-10 14:35:22 - assistant - WARNING - No location specified, using default
52025-11-10 14:35:23 - assistant - INFO - Weather API returned: Sunny, 72°F
62025-11-10 14:35:23 - assistant - INFO - Assistant response: It's sunny and 72°F.Here the logs reveal something interesting. The agent correctly identified the need for the weather tool, but it had to use a default location because the user didn't specify one. The WARNING level log highlights this, and it explains why the response might not match what the user expected if they're in a different location.
These examples show how logs turn your agent's internal process into something visible and understandable.
Practical Logging Patterns
As you add logging to your agent, you'll develop patterns that work well for different situations. Here are a few patterns that prove useful in practice.
Entry and Exit Logging: Log when important functions start and when they complete. This helps you verify the agent is executing the right sequence of operations.
1def plan_multi_step_task(goal):
2 logger.info(f"Planning started for goal: {goal}")
3
4 # Generate plan
5 plan = create_plan(goal)
6 logger.info(f"Generated plan with {len(plan)} steps")
7
8 # Execute each step
9 for i, step in enumerate(plan):
10 logger.info(f"Executing step {i+1}/{len(plan)}: {step}")
11 result = execute_step(step)
12 logger.info(f"Step {i+1} completed with result: {result}")
13
14 logger.info("Planning completed successfully")
15 return result1def plan_multi_step_task(goal):
2 logger.info(f"Planning started for goal: {goal}")
3
4 # Generate plan
5 plan = create_plan(goal)
6 logger.info(f"Generated plan with {len(plan)} steps")
7
8 # Execute each step
9 for i, step in enumerate(plan):
10 logger.info(f"Executing step {i+1}/{len(plan)}: {step}")
11 result = execute_step(step)
12 logger.info(f"Step {i+1} completed with result: {result}")
13
14 logger.info("Planning completed successfully")
15 return resultState Change Logging: Log whenever your agent's state changes in an important way. This helps you track how the agent's understanding evolves.
1def update_conversation_state(new_info):
2 logger.info(f"State before: {current_state}")
3 current_state.update(new_info)
4 logger.info(f"State after: {current_state}")1def update_conversation_state(new_info):
2 logger.info(f"State before: {current_state}")
3 current_state.update(new_info)
4 logger.info(f"State after: {current_state}")Decision Point Logging: When the agent makes a decision based on some logic, log both what it decided and why.
1def should_ask_for_clarification(query, confidence):
2 if confidence < 0.7:
3 logger.info(
4 f"Requesting clarification (confidence: {confidence} < 0.7)"
5 )
6 return True
7 else:
8 logger.info(
9 f"Proceeding without clarification (confidence: {confidence})"
10 )
11 return False1def should_ask_for_clarification(query, confidence):
2 if confidence < 0.7:
3 logger.info(
4 f"Requesting clarification (confidence: {confidence} < 0.7)"
5 )
6 return True
7 else:
8 logger.info(
9 f"Proceeding without clarification (confidence: {confidence})"
10 )
11 return FalseThese patterns give structure to your logging, making it easier to find the information you need when debugging or analyzing your agent's behavior.
Glossary
Log Level: A category that indicates the importance or severity of a log message (DEBUG, INFO, WARNING, ERROR). Log levels let you filter messages to see only the detail you need.
Logging: The practice of recording what happens during program execution. For agents, logging captures decisions, tool calls, and reasoning steps to make the agent's behavior observable.
Structured Logging: A logging approach that uses consistent formats and includes contextual information like timestamps and request IDs. Structured logs are easier to search, filter, and analyze than simple print statements.
Redaction: The process of removing or masking sensitive information before logging it. Redaction prevents passwords, API keys, or personal data from appearing in log files.
Quiz
Ready to test your understanding? Take this quick quiz to reinforce what you've learned about adding logs to AI agents.
Reference

About the author: Michael Brenndoerfer
All opinions expressed here are my own and do not reflect the views of my employer.
Michael currently works as an Associate Director of Data Science at EQT Partners in Singapore, where he drives AI and data initiatives across private capital investments.
With over a decade of experience spanning private equity, management consulting, and software engineering, he specializes in building and scaling analytics capabilities from the ground up. He has published research in leading AI conferences and holds expertise in machine learning, natural language processing, and value creation through data.
Related Content

Scaling Up without Breaking the Bank: AI Agent Performance & Cost Optimization at Scale
Learn how to scale AI agents from single users to thousands while maintaining performance and controlling costs. Covers horizontal scaling, load balancing, monitoring, cost controls, and prompt optimization strategies.

Managing and Reducing AI Agent Costs: Complete Guide to Cost Optimization Strategies
Learn how to dramatically reduce AI agent API costs without sacrificing capability. Covers model selection, caching, batching, prompt optimization, and budget controls with practical Python examples.

Speeding Up AI Agents: Performance Optimization Techniques for Faster Response Times
Learn practical techniques to make AI agents respond faster, including model selection strategies, response caching, streaming, parallel execution, and prompt optimization for reduced latency.
Stay updated
Get notified when I publish new articles on data and AI, private equity, technology, and more.

