Adding Logs to AI Agents: Complete Guide to Observability & Debugging

Michael BrenndoerferAugust 6, 202510 min read

Learn how to add logging to AI agents to debug behavior, track decisions, and monitor tool usage. Includes practical Python examples with structured logging patterns and best practices.

Adding Logs to the Agent

When you build an AI agent, you're creating something that makes decisions on its own. It decides when to use tools, how to reason through problems, and what information to retrieve from memory. This autonomy is powerful, but it also means you can't always predict what the agent will do. Sometimes it works perfectly. Other times, it gives an unexpected answer or calls the wrong tool. When that happens, you need a way to understand what went wrong.

This is where logging comes in. By adding logs at key decision points in your agent's code, you create a trail of breadcrumbs showing exactly what the agent did and why. Think of it like keeping a lab notebook during an experiment. Without notes, you won't remember what worked and what didn't. With good notes, you can trace back through every step and spot the problem.

Let's see how to add logging to our personal assistant so we can peer inside its decision-making process.

Why Agents Need Logs

Our assistant has grown considerably since we started building it. It now uses tools, maintains memory, plans multi-step tasks, and reasons through complex problems. Each of these capabilities involves the agent making choices:

  • Should I use the calculator tool or try to answer this math question directly?
  • Which pieces of conversation history are relevant to include in my context?
  • What's the first step in this multi-step plan?
  • Do I have enough information to answer, or do I need to call another tool?

When your agent makes the right choice, everything works smoothly. But when it makes the wrong choice, you need to know which decision went wrong and why. Without logs, debugging feels like detective work with no clues. You see the final wrong answer, but you don't know whether the agent:

  • Used the wrong tool
  • Failed to retrieve relevant information from memory
  • Misunderstood the user's request
  • Made an error in its reasoning chain

Logs solve this problem by recording what happens at each step. They turn your agent from a black box into something you can observe and understand.

What to Log

You don't need to log everything your agent does. Too many logs become noise, making it harder to find useful information. Instead, focus on logging the key decision points and transitions in your agent's flow.

Here are the most valuable things to log:

User Input: Record what the user asked for. This gives you the starting point for tracing through what the agent did.

In[3]:
Code
## Using Claude Sonnet 4.5 for its superior agent reasoning capabilities
import os
from anthropic import Anthropic
import logging

## Set up basic logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

def process_user_query(user_input):
    logger.info(f"Received user query: {user_input}")
    # Process the query...

Tool Decisions: Log when the agent decides to use a tool and which tool it chose. This helps you verify the agent is selecting the right tool for each task.

In[4]:
Code
def decide_tool_usage(query, available_tools):
    # Agent decides which tool to use
    selected_tool = agent_select_tool(query, available_tools)
    logger.info(f"Agent selected tool: {selected_tool} for query: {query}")
    return selected_tool

Tool Calls and Results: Record both the input to each tool and what the tool returned. If a tool fails or returns unexpected data, you'll want to know.

In[5]:
Code
def call_calculator(expression):
    logger.info(f"Calling calculator with expression: {expression}")
    try:
        result = eval(expression)  # Simplified for example
        logger.info(f"Calculator returned: {result}")
        return result
    except Exception as e:
        logger.error(f"Calculator failed: {e}")
        return None

Reasoning Steps: When your agent uses chain-of-thought or other reasoning techniques, log the intermediate thoughts. This lets you follow the agent's logic.

In[6]:
Code
def reason_through_problem(problem):
    logger.info(f"Starting reasoning for problem: {problem}")
    # Agent generates reasoning steps
    steps = generate_reasoning_steps(problem)
    for i, step in enumerate(steps):
        logger.info(f"Reasoning step {i+1}: {step}")
    return steps

Memory Operations: Log when the agent retrieves information from memory or stores something new. This helps you verify the agent is using its memory correctly.

In[7]:
Code
def retrieve_from_memory(query):
    logger.info(f"Searching memory for: {query}")
    results = memory_search(query)
    logger.info(f"Found {len(results)} relevant items in memory")
    return results

Final Response: Log the agent's final answer to the user. Combined with the initial query log, this gives you the complete input-output pair.

In[8]:
Code
def send_response(response):
    logger.info(f"Sending response to user: {response}")
    return response

Adding Logs to Our Assistant

Let's take a simplified version of our personal assistant and add logging at the key points. We'll start by setting up the logging configuration.

In[9]:
Code
## Example (Claude Sonnet 4.5)
import os
from anthropic import Anthropic
import logging

## Configure logging with a clear format
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)

Now let's look at the main query processing function with logging at each decision point:

In[10]:
Code
def process_query(user_input):
    # Log the incoming query
    logger.info(f"User query: {user_input}")
    
    # Check if we need tools
    needs_tool = check_tool_requirement(user_input)
    logger.info(f"Tool required: {needs_tool}")
    
    if needs_tool:
        tool_name = select_tool(user_input)
        logger.info(f"Selected tool: {tool_name}")
        
        tool_result = use_tool(tool_name, user_input)
        logger.info(f"Tool result: {tool_result}")
        
        response = generate_response_with_tool(user_input, tool_result)
    else:
        logger.info("Generating direct response (no tool)")
        response = generate_direct_response(user_input)
    
    # Log final response
    logger.info(f"Assistant response: {response}")
    return response

Notice how each major decision point has a log statement. When you run this agent and something goes wrong, you can trace through the logs to see exactly what path the agent took.

Log Levels for Different Information

Python's logging module provides different levels for different types of information. Using the right level helps you filter logs based on what you're looking for.

INFO: Normal operation milestones. Use this for standard agent actions like "received query", "selected tool", "generated response".

DEBUG: Detailed information useful during development. Use this for verbose details like the full content of prompts or intermediate data structures.

WARNING: Something unexpected happened, but the agent recovered. Use this for cases like "tool call failed, retrying" or "no relevant memory found".

ERROR: Something went wrong that prevented normal operation. Use this when a tool crashes or the agent can't complete a request.

Here's how you might use different levels:

In[11]:
Code
def use_calculator(expression):
    logger.debug(f"Calculator input (full): {expression}")
    
    try:
        result = eval(expression)
        logger.info(f"Calculator computed: {result}")
        return result
    except SyntaxError as e:
        logger.warning(f"Invalid expression: {expression}, error: {e}")
        return "I couldn't understand that mathematical expression"
    except Exception as e:
        logger.error(f"Calculator failed unexpectedly: {e}")
        return None

With log levels, you can control how much detail you see. During development, set the level to DEBUG to see everything. In production, set it to INFO or WARNING to reduce noise.

Making Logs Useful

Raw print statements can work for quick debugging, but structured logging makes your logs much more useful, especially when you're handling many requests or running the agent in production.

Include Timestamps: Knowing when each action happened helps you understand timing issues and correlate logs with user reports. The logging module adds timestamps automatically when you configure the format:

In[12]:
Code
logging.basicConfig(
    format='%(asctime)s - %(levelname)s - %(message)s'
)

Add Context: Include relevant identifiers in your logs so you can track a single request through the entire system. If your agent handles multiple users, log the user ID. If it processes multiple requests, log a request ID.

In[13]:
Code
class PersonalAssistant:
    def __init__(self, api_key, user_id):
        self.user_id = user_id
        self.logger = logging.getLogger(f"assistant.{user_id}")
    
    def process_query(self, user_input, request_id):
        self.logger.info(
            f"[Request {request_id}] Processing: {user_input}"
        )

Structure Your Messages: Use a consistent format for logs of the same type. This makes it easier to search through logs or process them automatically.

In[26]:
Code
## Good: Consistent structure
logger.info(f"Tool called: {tool_name}, input: {input_data}")
logger.info(f"Tool completed: {tool_name}, result: {result}")

## Less useful: Inconsistent structure
logger.info(f"Using {tool_name}")
logger.info(f"Got back: {result}")

Be Selective About Sensitive Data: Don't log sensitive information like passwords, API keys, or personal user data. If you need to log something that might contain sensitive data, redact it first.

In[14]:
Code
def log_user_query(query):
    # Redact emails from logs
    safe_query = query.replace(
        r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b',
        '[EMAIL]'
    )
    logger.info(f"User query: {safe_query}")

Seeing Logs in Action

Let's look at what logs from our assistant might look like during a typical interaction. This gives you a sense of how logs help you understand what's happening.

The user asks: "What's 234 times 567?"

2025-11-10 14:32:15 - assistant - INFO - User query: What's 234 times 567?
2025-11-10 14:32:15 - assistant - INFO - Tool required: True
2025-11-10 14:32:15 - assistant - INFO - Selected tool: calculator
2025-11-10 14:32:15 - assistant - INFO - Calling calculator with: 234 * 567
2025-11-10 14:32:15 - assistant - INFO - Calculator returned: 132678
2025-11-10 14:32:16 - assistant - INFO - Assistant response: The answer is 132,678.

These logs tell a clear story. The agent received a math question, recognized it needed the calculator tool, called the calculator with the correct expression, got the result, and sent the answer to the user. Everything worked as expected.

Now imagine the user asks: "What's the weather like?"

2025-11-10 14:35:22 - assistant - INFO - User query: What's the weather like?
2025-11-10 14:35:22 - assistant - INFO - Tool required: True
2025-11-10 14:35:22 - assistant - INFO - Selected tool: weather_api
2025-11-10 14:35:22 - assistant - WARNING - No location specified, using default
2025-11-10 14:35:23 - assistant - INFO - Weather API returned: Sunny, 72°F
2025-11-10 14:35:23 - assistant - INFO - Assistant response: It's sunny and 72°F.

Here the logs reveal something interesting. The agent correctly identified the need for the weather tool, but it had to use a default location because the user didn't specify one. The WARNING level log highlights this, and it explains why the response might not match what the user expected if they're in a different location.

These examples show how logs turn your agent's internal process into something visible and understandable.

Practical Logging Patterns

As you add logging to your agent, you'll develop patterns that work well for different situations. Here are a few patterns that prove useful in practice.

Entry and Exit Logging: Log when important functions start and when they complete. This helps you verify the agent is executing the right sequence of operations.

In[15]:
Code
def plan_multi_step_task(goal):
    logger.info(f"Planning started for goal: {goal}")
    
    # Generate plan
    plan = create_plan(goal)
    logger.info(f"Generated plan with {len(plan)} steps")
    
    # Execute each step
    for i, step in enumerate(plan):
        logger.info(f"Executing step {i+1}/{len(plan)}: {step}")
        result = execute_step(step)
        logger.info(f"Step {i+1} completed with result: {result}")
    
    logger.info("Planning completed successfully")
    return result

State Change Logging: Log whenever your agent's state changes in an important way. This helps you track how the agent's understanding evolves.

In[16]:
Code
def update_conversation_state(new_info):
    logger.info(f"State before: {current_state}")
    current_state.update(new_info)
    logger.info(f"State after: {current_state}")

Decision Point Logging: When the agent makes a decision based on some logic, log both what it decided and why.

In[17]:
Code
def should_ask_for_clarification(query, confidence):
    if confidence < 0.7:
        logger.info(
            f"Requesting clarification (confidence: {confidence} < 0.7)"
        )
        return True
    else:
        logger.info(
            f"Proceeding without clarification (confidence: {confidence})"
        )
        return False

These patterns give structure to your logging, making it easier to find the information you need when debugging or analyzing your agent's behavior.

Glossary

Log Level: A category that indicates the importance or severity of a log message (DEBUG, INFO, WARNING, ERROR). Log levels let you filter messages to see only the detail you need.

Logging: The practice of recording what happens during program execution. For agents, logging captures decisions, tool calls, and reasoning steps to make the agent's behavior observable.

Structured Logging: A logging approach that uses consistent formats and includes contextual information like timestamps and request IDs. Structured logs are easier to search, filter, and analyze than simple print statements.

Redaction: The process of removing or masking sensitive information before logging it. Redaction prevents passwords, API keys, or personal data from appearing in log files.

Quiz

Ready to test your understanding? Take this quick quiz to reinforce what you've learned about adding logs to AI agents.

Loading component...

Reference

BIBTEXAcademic
@misc{addinglogstoaiagentscompleteguidetoobservabilitydebugging, author = {Michael Brenndoerfer}, title = {Adding Logs to AI Agents: Complete Guide to Observability & Debugging}, year = {2025}, url = {https://mbrenndoerfer.com/writing/adding-logs-to-ai-agents-observability-debugging}, organization = {mbrenndoerfer.com}, note = {Accessed: 2025-12-25} }
APAAcademic
Michael Brenndoerfer (2025). Adding Logs to AI Agents: Complete Guide to Observability & Debugging. Retrieved from https://mbrenndoerfer.com/writing/adding-logs-to-ai-agents-observability-debugging
MLAAcademic
Michael Brenndoerfer. "Adding Logs to AI Agents: Complete Guide to Observability & Debugging." 2025. Web. 12/25/2025. <https://mbrenndoerfer.com/writing/adding-logs-to-ai-agents-observability-debugging>.
CHICAGOAcademic
Michael Brenndoerfer. "Adding Logs to AI Agents: Complete Guide to Observability & Debugging." Accessed 12/25/2025. https://mbrenndoerfer.com/writing/adding-logs-to-ai-agents-observability-debugging.
HARVARDAcademic
Michael Brenndoerfer (2025) 'Adding Logs to AI Agents: Complete Guide to Observability & Debugging'. Available at: https://mbrenndoerfer.com/writing/adding-logs-to-ai-agents-observability-debugging (Accessed: 12/25/2025).
SimpleBasic
Michael Brenndoerfer (2025). Adding Logs to AI Agents: Complete Guide to Observability & Debugging. https://mbrenndoerfer.com/writing/adding-logs-to-ai-agents-observability-debugging