Step-by-Step Problem Solving: Chain-of-Thought Reasoning for AI Agents
Back to Writing

Step-by-Step Problem Solving: Chain-of-Thought Reasoning for AI Agents

Michael Brenndoerfer•November 8, 2025•12 min read•1,569 words•Interactive

Learn how to teach AI agents to think through problems step by step using chain-of-thought reasoning. Discover practical techniques for improving accuracy and transparency in complex tasks.

AI Agent Handbook Cover
Part of AI Agent Handbook

This article is part of the free-to-read AI Agent Handbook

View full handbook

Step-by-Step Problem Solving (Chain-of-Thought)

You've learned how to write clear prompts and use strategies like roles and examples to guide your AI agent. But what happens when you ask your agent a question that requires real thinking? Not just recalling facts, but working through a problem step by step?

Try this experiment. Ask a language model: "If a train leaves Chicago at 2 PM traveling 60 mph, and another train leaves St. Louis (300 miles away) at 3 PM traveling 75 mph toward Chicago, when do they meet?"

You might get an answer. But is it right? The model might jump straight to a conclusion without showing its work. And when the answer is wrong, you have no idea where the reasoning broke down.

Now try adding one simple phrase: "Let's think this through step by step."

Suddenly, the model shows its reasoning. It breaks down the problem, considers each piece, and works toward the answer methodically. This simple technique, called chain-of-thought reasoning, transforms how AI agents handle complex problems.

Why Reasoning Matters

Language models are excellent at pattern matching and generating text. They can recall facts, write coherently, and follow instructions. But complex problems require more than pattern matching. They require reasoning: breaking down a problem, considering relationships, and building toward a solution.

Without explicit guidance to reason, models often take shortcuts. They might pattern-match to similar problems they've seen in training and output an answer that looks plausible but is actually wrong. This is especially common with:

  • Math problems: Where each step depends on the previous one
  • Logic puzzles: Where you need to track multiple constraints
  • Multi-step tasks: Where you must plan a sequence of actions
  • Analytical questions: Where you need to weigh evidence and draw conclusions

The solution isn't a more powerful model (though that can help). The solution is teaching the model to think through problems explicitly, showing its work as it goes.

What Is Chain-of-Thought Reasoning?

Chain-of-thought (CoT) reasoning is simple: instead of asking the model to jump straight to an answer, you prompt it to explain its thinking step by step. You're essentially asking it to "show its work," just like a math teacher would require.

When you use chain-of-thought prompting, the model generates intermediate reasoning steps before arriving at a final answer. These steps serve two purposes:

  1. They improve accuracy: By working through the problem explicitly, the model is less likely to make logical errors or skip important considerations.

  2. They provide transparency: You can see how the model arrived at its answer, which helps you trust the result and debug when something goes wrong.

Think of it like the difference between asking someone "What's 17×2317 \times 23?" versus "What's 17×2317 \times 23? Show me how you calculated it." The second request produces not just an answer, but a process you can verify.

The Magic Phrase: "Let's Think Step by Step"

The simplest way to trigger chain-of-thought reasoning is to add a phrase like "Let's think through this step by step" or "Let's solve this step by step" to your prompt. This small addition signals to the model that you want explicit reasoning, not just a final answer.

Example (OpenAI)

Let's see this in action with a simple word problem:

1from openai import OpenAI
2
3client = OpenAI(api_key="your-api-key-here")
4
5## Without chain-of-thought
6prompt_simple = """A restaurant has 23 tables. Each table has 4 chairs. 
7If 12 chairs are broken and removed, how many chairs are left?"""
8
9## With chain-of-thought
10prompt_cot = """A restaurant has 23 tables. Each table has 4 chairs. 
11If 12 chairs are broken and removed, how many chairs are left?
12
13Let's think through this step by step."""
14
15## Get both responses
16response_simple = client.chat.completions.create(
17    model="gpt-4",
18    messages=[{"role": "user", "content": prompt_simple}]
19)
20
21response_cot = client.chat.completions.create(
22    model="gpt-4",
23    messages=[{"role": "user", "content": prompt_cot}]
24)
25
26print("Without CoT:")
27print(response_simple.choices[0].message.content)
28print("\nWith CoT:")
29print(response_cot.choices[0].message.content)

The first response might just say "80 chairs" (which is wrong, by the way). The second response will show the reasoning:

1Let's think through this step by step.
2
3Step 1: Calculate the total number of chairs
4- 23 tables $\times$ 4 chairs per table = 92 chairs
5
6Step 2: Subtract the broken chairs
7- 92 chairs - 12 broken chairs = 80 chairs
8
9Therefore, there are 80 chairs left in the restaurant.

Wait, that's still 80. Let me recalculate: 23×4=9223 \times 4 = 92, then 92−12=8092 - 12 = 80. Actually, that's correct! The point is that with chain-of-thought, you can verify each step. If there were an error, you'd see exactly where it occurred.

When Chain-of-Thought Helps Most

Chain-of-thought reasoning isn't necessary for every task. If you're asking for a fact ("What's the capital of France?") or a simple transformation ("Translate this to Spanish"), you don't need explicit reasoning steps. The model can answer directly.

But chain-of-thought becomes valuable when:

The problem has multiple steps: Each step builds on the previous one, and skipping a step leads to errors.

The answer isn't obvious: The problem requires analysis, not just recall.

You need to verify the logic: Being able to check the reasoning is as important as getting the right answer.

The model tends to make mistakes: If you notice the model getting certain types of questions wrong, adding chain-of-thought often helps.

Example: A Tricky Riddle

Riddles are a great test case because they require careful thinking. Let's try one:

Example (OpenAI)

1from openai import OpenAI
2
3client = OpenAI(api_key="your-api-key-here")
4
5riddle = """I speak without a mouth and hear without ears. 
6I have no body, but I come alive with wind. What am I?
7
8Let's think through this step by step."""
9
10response = client.chat.completions.create(
11    model="gpt-4",
12    messages=[{"role": "user", "content": riddle}]
13)
14
15print(response.choices[0].message.content)

The model might respond:

1Let's think through this step by step.
2
3Step 1: Analyze the first clue - "I speak without a mouth"
4This suggests something that makes sound but isn't alive or doesn't have a physical mouth.
5
6Step 2: Consider the second clue - "hear without ears"
7This means it can respond to or carry sound without having ears. This reinforces that it's not a living thing.
8
9Step 3: Examine the third clue - "I have no body"
10This confirms it's not a physical creature or person.
11
12Step 4: Look at the final clue - "I come alive with wind"
13This is the key clue. Something that needs wind to function and creates sound.
14
15Step 5: Put it together
16Something without a body that makes sound and requires wind... this describes an echo. Sound travels through air (wind), bounces off surfaces, and returns as an echo. An echo "speaks" (produces sound) without a mouth, "hears" (responds to sound) without ears, has no physical body, and exists because of sound waves traveling through air.
17
18Answer: An echo

By working through each clue systematically, the model arrives at the correct answer with clear reasoning you can follow.

Teaching Through Examples: Few-Shot Chain-of-Thought

Remember few-shot prompting from the previous chapter? You can combine that technique with chain-of-thought by showing the model examples of step-by-step reasoning. This is especially powerful for tasks where you want consistent reasoning patterns.

Instead of just showing input-output pairs, you show input-reasoning-output triplets. The model learns not just what to answer, but how to think about the problem.

Example (OpenAI)

Let's say you're building a feature where your assistant helps evaluate whether claims are supported by evidence:

1from openai import OpenAI
2
3client = OpenAI(api_key="your-api-key-here")
4
5prompt = """Determine if the claim is supported by the evidence. Show your reasoning.
6
7Example 1:
8Claim: "Exercise improves mental health"
9Evidence: "A study of 1,000 participants found that those who exercised 30 minutes daily reported 25% lower anxiety levels than those who didn't exercise."
10Reasoning:
11- The evidence comes from a study with a large sample size (1,000 participants)
12- It shows a specific, measurable benefit (25% lower anxiety)
13- Anxiety is a component of mental health
14- The evidence directly relates to the claim
15Conclusion: Supported
16
17Example 2:
18Claim: "Coffee causes heart disease"
19Evidence: "Some people who drink coffee have reported heart palpitations."
20Reasoning:
21- The evidence is anecdotal ("some people reported")
22- Heart palpitations are not the same as heart disease
23- No causal relationship is established (correlation vs causation)
24- The evidence is too weak to support the strong claim
25Conclusion: Not supported
26
27Now evaluate this:
28Claim: "Reading before bed improves sleep quality"
29Evidence: "A survey found that 60% of people who read before bed felt they slept better."
30"""
31
32response = client.chat.completions.create(
33    model="gpt-4",
34    messages=[{"role": "user", "content": prompt}]
35)
36
37print(response.choices[0].message.content)

The model will follow the reasoning pattern you demonstrated:

1Reasoning:
2- The evidence comes from a survey, which captures self-reported data
3- 60% is a majority, suggesting a notable correlation
4- "Felt they slept better" is subjective, not an objective measure of sleep quality
5- The evidence shows correlation but doesn't prove causation (other factors could be involved)
6- The sample size and methodology aren't specified, which limits confidence
7Conclusion: Partially supported (shows correlation but not causation)

By providing examples of good reasoning, you've taught the model how to approach this type of analysis.

Practical Applications for Your Personal Assistant

Let's apply chain-of-thought reasoning to make your personal assistant more capable. Here are some scenarios where it helps:

Planning a Multi-Step Task

1from openai import OpenAI
2
3client = OpenAI(api_key="your-api-key-here")
4
5prompt = """I need to prepare for a presentation next Tuesday. I need to:
6- Research the topic (3 hours)
7- Create slides (4 hours)
8- Practice presenting (2 hours)
9- Get feedback from a colleague (1 hour)
10
11Today is Thursday. I have 2 hours available each evening (Thu, Fri, Mon).
12I have 6 hours available on Saturday.
13
14Create a schedule for completing these tasks. Think through this step by step."""
15
16response = client.chat.completions.create(
17    model="gpt-4",
18    messages=[{"role": "user", "content": prompt}]
19)
20
21print(response.choices[0].message.content)

The model will break down the problem:

1Let's think through this step by step.
2
3Step 1: Calculate total time needed
4- Research: 3 hours
5- Create slides: 4 hours
6- Practice: 2 hours
7- Feedback: 1 hour
8Total: 10 hours
9
10Step 2: Calculate available time
11- Thursday evening: 2 hours
12- Friday evening: 2 hours
13- Saturday: 6 hours
14- Monday evening: 2 hours
15Total: 12 hours (enough time with 2 hours buffer)
16
17Step 3: Consider dependencies
18- Must research before creating slides
19- Must create slides before practicing
20- Should get feedback after practicing
21
22Step 4: Create schedule
23Thursday evening (2 hours): Research (part 1)
24Friday evening (2 hours): Research (part 2, complete)
25Saturday (6 hours): Create slides (4 hours) + Practice (2 hours)
26Monday evening (2 hours): Get colleague feedback (1 hour) + final practice (1 hour)
27
28This schedule respects the dependencies and uses 11 of your 12 available hours.

Debugging Logic

Chain-of-thought is also valuable for helping users debug their own thinking:

1from openai import OpenAI
2
3client = OpenAI(api_key="your-api-key-here")
4
5prompt = """I'm trying to decide if I should buy a new laptop. Help me think through this decision step by step.
6
7Current laptop: 4 years old, still works but slow
8New laptop cost: $1,200
9My budget: $2,000 saved for tech upgrades
10Other needs: Might need a new phone in 6 months ($800)
11
12Should I buy the laptop now?"""
13
14response = client.chat.completions.create(
15    model="gpt-4",
16    messages=[{"role": "user", "content": prompt}]
17)
18
19print(response.choices[0].message.content)

The model will help structure the decision:

1Let's think through this step by step.
2
3Step 1: Assess the urgency
4- Current laptop still works, just slow
5- This suggests want, not urgent need
6- You can likely wait if needed
7
8Step 2: Check the budget
9- Total budget: $2,000
10- Laptop cost: $1,200
11- Phone cost (in 6 months): $800
12- Total needed: $2,000
13- Remaining after both: $0
14
15Step 3: Consider timing
16- If you buy the laptop now: $800 left for phone
17- That's exactly enough for the phone
18- But no buffer for unexpected expenses
19
20Step 4: Evaluate alternatives
21- Wait on laptop, buy phone when needed: keeps more flexibility
22- Buy laptop now: commits most of your budget
23- Fix/upgrade current laptop: might be cheaper option
24
25Step 5: Recommendation
26Given that your current laptop works (just slow) and you have a more predictable phone need in 6 months, consider:
27- Option A: Wait on the laptop purchase until after you assess the phone situation
28- Option B: Look into cheaper laptop upgrades (RAM, SSD) to extend current laptop life
29- Option C: If the laptop significantly impacts your work/productivity, buy it now but plan carefully for the phone purchase
30
31The decision depends on how much the slow laptop affects your daily life versus the importance of maintaining budget flexibility.

This kind of structured thinking helps users make better decisions by explicitly considering all factors.

Common Patterns in Chain-of-Thought Reasoning

As you use chain-of-thought prompting, you'll notice certain patterns that work well for different types of problems:

For math problems: Break into calculation steps, show each intermediate result

For logic puzzles: List constraints, test possibilities, eliminate what doesn't work

For planning tasks: Identify requirements, check resources, sequence actions

For analytical questions: State the question, gather relevant facts, weigh evidence, draw conclusions

For decision-making: Define options, list pros and cons for each, compare, recommend

You don't need to specify these patterns in your prompt. Just asking for step-by-step thinking often triggers the appropriate pattern. But if the model isn't structuring its reasoning the way you want, you can provide an example that demonstrates the pattern you prefer.

Limitations and When Not to Use Chain-of-Thought

Chain-of-thought reasoning is powerful, but it's not always the right tool:

It's slower: Generating reasoning steps takes more time than jumping to an answer. For simple questions, this overhead isn't worth it.

It uses more tokens: More generated text means higher API costs. Use chain-of-thought when accuracy matters more than speed or cost.

It can be verbose: Sometimes you just want a quick answer, not a detailed explanation. Match the technique to your needs.

It doesn't guarantee correctness: Chain-of-thought improves accuracy, but the model can still make errors in its reasoning. Always verify critical results.

The key is knowing when the benefits (better accuracy, transparency, debuggability) outweigh the costs (time, tokens, verbosity).

Building Intuition

Start by applying chain-of-thought to problems where the model makes mistakes. If a simple prompt produces wrong answers, try adding "Let's think step by step." You'll quickly notice which types of problems benefit most from explicit reasoning.

Keep track of what works. When you find a prompt pattern that produces good reasoning for a particular type of problem, save it. Over time, you'll build a library of effective chain-of-thought prompts you can reuse and adapt.

Pay attention to how the model structures its reasoning. You'll start recognizing good reasoning patterns versus sloppy ones. This helps you craft better prompts and evaluate the model's outputs more effectively.

Looking Ahead

Chain-of-thought reasoning is your first tool for teaching agents to think, not just respond. By prompting the model to show its work, you get more accurate answers and insight into how it arrived at them.

But we can go further. In the next chapter, you'll learn how to make your agent check its own work and refine its answers. You'll discover techniques for getting the agent to review its reasoning, consider alternatives, and improve its responses through self-reflection. These approaches build on chain-of-thought to create even more reliable agents.

The key takeaway: when you need your agent to handle complex problems, don't just ask for an answer. Ask it to think through the problem step by step. That simple change transforms a pattern-matching system into something that can reason.

Key Takeaways

  • Chain-of-thought reasoning improves accuracy by making the model show its work instead of jumping to conclusions
  • The phrase "Let's think step by step" is often all you need to trigger explicit reasoning
  • Use chain-of-thought for complex problems where accuracy matters more than speed
  • Combine with few-shot prompting to teach specific reasoning patterns
  • Verify the reasoning, not just the answer, to catch errors and build trust
  • Save effective patterns to build a library of proven chain-of-thought prompts

With chain-of-thought reasoning in your toolkit, your AI agent can handle problems that require real thinking. The next chapter builds on this foundation by teaching your agent to check and refine its own reasoning.

Quiz

Ready to test your understanding? Take this quick quiz to reinforce what you've learned about chain-of-thought reasoning.

Loading component...

Reference

BIBTEXAcademic
@misc{stepbystepproblemsolvingchainofthoughtreasoningforaiagents, author = {Michael Brenndoerfer}, title = {Step-by-Step Problem Solving: Chain-of-Thought Reasoning for AI Agents}, year = {2025}, url = {https://mbrenndoerfer.com/writing/step-by-step-problem-solving-chain-of-thought-reasoning}, organization = {mbrenndoerfer.com}, note = {Accessed: 2025-11-09} }
APAAcademic
Michael Brenndoerfer (2025). Step-by-Step Problem Solving: Chain-of-Thought Reasoning for AI Agents. Retrieved from https://mbrenndoerfer.com/writing/step-by-step-problem-solving-chain-of-thought-reasoning
MLAAcademic
Michael Brenndoerfer. "Step-by-Step Problem Solving: Chain-of-Thought Reasoning for AI Agents." 2025. Web. 11/9/2025. <https://mbrenndoerfer.com/writing/step-by-step-problem-solving-chain-of-thought-reasoning>.
CHICAGOAcademic
Michael Brenndoerfer. "Step-by-Step Problem Solving: Chain-of-Thought Reasoning for AI Agents." Accessed 11/9/2025. https://mbrenndoerfer.com/writing/step-by-step-problem-solving-chain-of-thought-reasoning.
HARVARDAcademic
Michael Brenndoerfer (2025) 'Step-by-Step Problem Solving: Chain-of-Thought Reasoning for AI Agents'. Available at: https://mbrenndoerfer.com/writing/step-by-step-problem-solving-chain-of-thought-reasoning (Accessed: 11/9/2025).
SimpleBasic
Michael Brenndoerfer (2025). Step-by-Step Problem Solving: Chain-of-Thought Reasoning for AI Agents. https://mbrenndoerfer.com/writing/step-by-step-problem-solving-chain-of-thought-reasoning
Michael Brenndoerfer

About the author: Michael Brenndoerfer

All opinions expressed here are my own and do not reflect the views of my employer.

Michael currently works as an Associate Director of Data Science at EQT Partners in Singapore, where he drives AI and data initiatives across private capital investments.

With over a decade of experience spanning private equity, management consulting, and software engineering, he specializes in building and scaling analytics capabilities from the ground up. He has published research in leading AI conferences and holds expertise in machine learning, natural language processing, and value creation through data.

Stay updated

Get notified when I publish new articles on data and AI, private equity, technology, and more.