Learn how to guide AI agents to verify and refine their reasoning through self-checking techniques. Discover practical methods for catching errors, improving accuracy, and building more reliable AI systems.

This article is part of the free-to-read AI Agent Handbook
Checking and Refining the Agent's Reasoning
In the previous chapter, you learned how chain-of-thought prompting helps agents break down complex problems step by step. But thinking through a problem once isn't always enough. Sometimes the agent makes a mistake in its reasoning, overlooks an important detail, or jumps to a conclusion too quickly.
What if you could teach your agent to double-check its own work? To pause and ask itself, "Wait, does this actually make sense?" This chapter explores how to guide agents to verify and refine their reasoning, catching errors before they become wrong answers.
Why Agents Need to Check Their Work
Imagine asking your personal assistant to calculate the best time to leave for the airport. It reasons through the problem: "Flight is at 3 PM, need to arrive 2 hours early, drive takes 30 minutes, so leave at 12:30 PM." Sounds good, right?
But what if it's Friday afternoon and traffic will be terrible? What if you need to return a rental car first? What if the airport is known for long security lines? A single pass through the reasoning might miss these factors.
Language models, despite their impressive capabilities, can make similar oversights. They might:
- Make arithmetic errors in multi-step calculations
- Misinterpret a key detail in the problem
- Apply the wrong formula or approach
- Forget to consider edge cases or constraints
- Reach a conclusion that contradicts their own reasoning
The good news? You can prompt the agent to review its thinking, just like you might ask a colleague, "Are you sure about that?" or "Can you walk me through your reasoning again?"
The Power of Self-Verification
Let's start with a simple example. Here's an agent solving a word problem:
Problem: "A store has 15 apples. They sell 8 apples in the morning and 6 apples in the afternoon. How many apples are left?"
First attempt:
1Let me solve this step by step:
21. Start with 15 apples
32. Sell 8 in the morning: 15 - 8 = 7 apples left
43. Sell 6 in the afternoon: 7 - 6 = 1 apple left
5
6Answer: 1 apple remains.1Let me solve this step by step:
21. Start with 15 apples
32. Sell 8 in the morning: 15 - 8 = 7 apples left
43. Sell 6 in the afternoon: 7 - 6 = 1 apple left
5
6Answer: 1 apple remains.This looks reasonable. But what if we prompt the agent to verify its answer?
Example (OpenAI)
1from openai import OpenAI
2
3client = OpenAI(api_key="your-api-key-here")
4
5## Initial problem solving
6problem = """A store has 15 apples. They sell 8 apples in the morning
7and 6 apples in the afternoon. How many apples are left?"""
8
9initial_response = client.chat.completions.create(
10 model="gpt-4",
11 messages=[
12 {"role": "user", "content": f"{problem}\n\nSolve this step by step."}
13 ]
14)
15
16initial_answer = initial_response.choices[0].message.content
17print("Initial answer:", initial_answer)
18
19## Now ask the agent to verify
20verification_prompt = f"""You solved this problem:
21{problem}
22
23Your answer was:
24{initial_answer}
25
26Please verify your answer by:
271. Checking each calculation
282. Confirming the logic makes sense
293. Considering if you missed anything
30
31Are you confident in this answer? If you find any issues, provide a corrected solution."""
32
33verification_response = client.chat.completions.create(
34 model="gpt-4",
35 messages=[
36 {"role": "user", "content": verification_prompt}
37 ]
38)
39
40print("\nVerification:", verification_response.choices[0].message.content)1from openai import OpenAI
2
3client = OpenAI(api_key="your-api-key-here")
4
5## Initial problem solving
6problem = """A store has 15 apples. They sell 8 apples in the morning
7and 6 apples in the afternoon. How many apples are left?"""
8
9initial_response = client.chat.completions.create(
10 model="gpt-4",
11 messages=[
12 {"role": "user", "content": f"{problem}\n\nSolve this step by step."}
13 ]
14)
15
16initial_answer = initial_response.choices[0].message.content
17print("Initial answer:", initial_answer)
18
19## Now ask the agent to verify
20verification_prompt = f"""You solved this problem:
21{problem}
22
23Your answer was:
24{initial_answer}
25
26Please verify your answer by:
271. Checking each calculation
282. Confirming the logic makes sense
293. Considering if you missed anything
30
31Are you confident in this answer? If you find any issues, provide a corrected solution."""
32
33verification_response = client.chat.completions.create(
34 model="gpt-4",
35 messages=[
36 {"role": "user", "content": verification_prompt}
37 ]
38)
39
40print("\nVerification:", verification_response.choices[0].message.content)The verification step might reveal:
1Let me verify my calculations:
21. Start: 15 apples โ
32. After morning sales: 15 - 8 = 7 โ
43. After afternoon sales: 7 - 6 = 1 โ
5
6Wait, let me also check if the problem is even possible:
7- Total sold: 8 + 6 = 14 apples
8- Started with: 15 apples
9- 15 - 14 = 1 apple
10
11Yes, the answer checks out. We have 1 apple remaining.1Let me verify my calculations:
21. Start: 15 apples โ
32. After morning sales: 15 - 8 = 7 โ
43. After afternoon sales: 7 - 6 = 1 โ
5
6Wait, let me also check if the problem is even possible:
7- Total sold: 8 + 6 = 14 apples
8- Started with: 15 apples
9- 15 - 14 = 1 apple
10
11Yes, the answer checks out. We have 1 apple remaining.In this case, the answer was correct. But the verification process adds confidence. The agent didn't just solve the problem; it confirmed the solution makes sense.
Techniques for Verification
You can guide agents to check their work in several ways. Each technique serves a different purpose.
Ask for Confirmation
The simplest approach: explicitly ask the agent if it's sure.
1You answered X. Are you confident in this answer?
2Double-check your reasoning before confirming.1You answered X. Are you confident in this answer?
2Double-check your reasoning before confirming.This prompt nudges the agent to review its work without dictating how to do it. Sometimes that's all you need.
Request Alternative Approaches
Ask the agent to solve the problem a different way, then compare results.
1You solved this using method A. Can you solve it using method B
2and see if you get the same answer?1You solved this using method A. Can you solve it using method B
2and see if you get the same answer?If both approaches yield the same result, you can be more confident. If they differ, something went wrong.
Prompt for Explanation
Ask the agent to explain its reasoning in more detail.
1Explain why you chose this approach and why your answer makes sense.1Explain why you chose this approach and why your answer makes sense.When the agent has to justify its reasoning, it often catches its own mistakes. This is similar to how explaining a problem to someone else helps you spot errors in your own thinking.
Check Against Constraints
Remind the agent of any constraints or requirements, then ask if its answer satisfies them.
1Your answer is X. Does this satisfy all the requirements:
2- Must be less than 100
3- Must be a whole number
4- Must be positive
5
6Verify each constraint.1Your answer is X. Does this satisfy all the requirements:
2- Must be less than 100
3- Must be a whole number
4- Must be positive
5
6Verify each constraint.This structured check helps catch violations the agent might have overlooked.
A More Complex Example
Let's see verification in action with a trickier problem.
Problem: "You have a 3-gallon jug and a 5-gallon jug. How can you measure exactly 4 gallons of water?"
This is a classic puzzle that requires creative thinking. Let's see how an agent might solve it, then verify its solution.
Example (Claude Agent SDK)
1from anthropic import Anthropic
2
3client = Anthropic(api_key="your-api-key-here")
4
5problem = """You have a 3-gallon jug and a 5-gallon jug.
6How can you measure exactly 4 gallons of water?"""
7
8## Initial solution
9initial_response = client.messages.create(
10 model="claude-3-5-sonnet-20241022",
11 max_tokens=1024,
12 messages=[
13 {"role": "user", "content": f"{problem}\n\nThink through this step by step."}
14 ]
15)
16
17solution = initial_response.content[0].text
18print("Initial solution:", solution)
19
20## Verification with detailed checking
21verification = client.messages.create(
22 model="claude-3-5-sonnet-20241022",
23 max_tokens=1024,
24 messages=[
25 {"role": "user", "content": f"""You proposed this solution to the jug problem:
26{solution}
27
28Please verify by:
291. Walking through each step and tracking the water in each jug
302. Confirming you end with exactly 4 gallons
313. Checking if there's a simpler solution
32
33Show your verification work clearly."""}
34 ]
35)
36
37print("\nVerification:", verification.content[0].text)1from anthropic import Anthropic
2
3client = Anthropic(api_key="your-api-key-here")
4
5problem = """You have a 3-gallon jug and a 5-gallon jug.
6How can you measure exactly 4 gallons of water?"""
7
8## Initial solution
9initial_response = client.messages.create(
10 model="claude-3-5-sonnet-20241022",
11 max_tokens=1024,
12 messages=[
13 {"role": "user", "content": f"{problem}\n\nThink through this step by step."}
14 ]
15)
16
17solution = initial_response.content[0].text
18print("Initial solution:", solution)
19
20## Verification with detailed checking
21verification = client.messages.create(
22 model="claude-3-5-sonnet-20241022",
23 max_tokens=1024,
24 messages=[
25 {"role": "user", "content": f"""You proposed this solution to the jug problem:
26{solution}
27
28Please verify by:
291. Walking through each step and tracking the water in each jug
302. Confirming you end with exactly 4 gallons
313. Checking if there's a simpler solution
32
33Show your verification work clearly."""}
34 ]
35)
36
37print("\nVerification:", verification.content[0].text)The agent might initially propose:
1Here's one way to measure 4 gallons:
21. Fill the 5-gallon jug completely
32. Pour from the 5-gallon jug into the 3-gallon jug (leaving 2 gallons in the 5-gallon jug)
43. Empty the 3-gallon jug
54. Pour the 2 gallons from the 5-gallon jug into the 3-gallon jug
65. Fill the 5-gallon jug again
76. Pour from the 5-gallon jug into the 3-gallon jug until it's full (this adds 1 gallon)
87. The 5-gallon jug now contains exactly 4 gallons1Here's one way to measure 4 gallons:
21. Fill the 5-gallon jug completely
32. Pour from the 5-gallon jug into the 3-gallon jug (leaving 2 gallons in the 5-gallon jug)
43. Empty the 3-gallon jug
54. Pour the 2 gallons from the 5-gallon jug into the 3-gallon jug
65. Fill the 5-gallon jug again
76. Pour from the 5-gallon jug into the 3-gallon jug until it's full (this adds 1 gallon)
87. The 5-gallon jug now contains exactly 4 gallonsDuring verification, the agent walks through each step:
1Let me verify step by step:
2
3Start: 5-gal = 0, 3-gal = 0
4
5Step 1: Fill 5-gallon jug
6โ 5-gal = 5, 3-gal = 0 โ
7
8Step 2: Pour into 3-gallon jug
9โ 5-gal = 2, 3-gal = 3 โ
10
11Step 3: Empty 3-gallon jug
12โ 5-gal = 2, 3-gal = 0 โ
13
14Step 4: Pour 2 gallons into 3-gallon jug
15โ 5-gal = 0, 3-gal = 2 โ
16
17Step 5: Fill 5-gallon jug
18โ 5-gal = 5, 3-gal = 2 โ
19
20Step 6: Pour into 3-gallon jug (which has 2 gallons, so can take 1 more)
21โ 5-gal = 4, 3-gal = 3 โ
22
23Result: 4 gallons in the 5-gallon jug. Solution verified!1Let me verify step by step:
2
3Start: 5-gal = 0, 3-gal = 0
4
5Step 1: Fill 5-gallon jug
6โ 5-gal = 5, 3-gal = 0 โ
7
8Step 2: Pour into 3-gallon jug
9โ 5-gal = 2, 3-gal = 3 โ
10
11Step 3: Empty 3-gallon jug
12โ 5-gal = 2, 3-gal = 0 โ
13
14Step 4: Pour 2 gallons into 3-gallon jug
15โ 5-gal = 0, 3-gal = 2 โ
16
17Step 5: Fill 5-gallon jug
18โ 5-gal = 5, 3-gal = 2 โ
19
20Step 6: Pour into 3-gallon jug (which has 2 gallons, so can take 1 more)
21โ 5-gal = 4, 3-gal = 3 โ
22
23Result: 4 gallons in the 5-gallon jug. Solution verified!This detailed verification confirms the solution works. The agent caught no errors here, but the systematic check builds confidence.
When Verification Catches Mistakes
Verification really shines when the initial reasoning has flaws. Consider this scenario:
Problem: "A train travels 60 miles in 1 hour. At this rate, how far will it travel in 90 minutes?"
Initial answer: "60 miles in 60 minutes, so 90 miles in 90 minutes."
This is wrong. The agent incorrectly assumed a 1:1 ratio between minutes and miles. Let's see how verification helps:
1## Using the verification pattern from earlier
2verification_prompt = """You said the train travels 90 miles in 90 minutes.
3
4Let's verify this:
51. What is the train's speed in miles per minute?
62. How many miles would it travel in 90 minutes at that speed?
73. Does your answer match your calculation?
8
9Check your work carefully."""1## Using the verification pattern from earlier
2verification_prompt = """You said the train travels 90 miles in 90 minutes.
3
4Let's verify this:
51. What is the train's speed in miles per minute?
62. How many miles would it travel in 90 minutes at that speed?
73. Does your answer match your calculation?
8
9Check your work carefully."""The agent might respond:
1Let me recalculate:
2
31. Speed: 60 miles / 60 minutes = 1 mile per minute
42. Distance in 90 minutes: 1 mile/minute ร 90 minutes = 90 miles
5
6Wait, that gives me 90 miles, which matches my answer. Let me verify once more:
7- 60 miles in 60 minutes = 1 mile per minute โ
8- 90 minutes ร 1 mile per minute = 90 miles โ
9
10Actually, my answer was correct. The train travels 90 miles in 90 minutes.1Let me recalculate:
2
31. Speed: 60 miles / 60 minutes = 1 mile per minute
42. Distance in 90 minutes: 1 mile/minute ร 90 minutes = 90 miles
5
6Wait, that gives me 90 miles, which matches my answer. Let me verify once more:
7- 60 miles in 60 minutes = 1 mile per minute โ
8- 90 minutes ร 1 mile per minute = 90 miles โ
9
10Actually, my answer was correct. The train travels 90 miles in 90 minutes.In this case, the agent's initial answer was actually right (my example was misleading). But the verification process forced it to show its work clearly, making the reasoning transparent.
Let's try a problem where the initial answer is genuinely wrong:
Problem: "If 5 machines can produce 5 widgets in 5 minutes, how many machines are needed to produce 100 widgets in 100 minutes?"
Initial answer: "20 machines (scaling up proportionally)."
This is a classic trick question. Let's verify:
1verification_prompt = """You said 20 machines are needed.
2
3Before confirming, let's think about this carefully:
41. How many widgets does ONE machine produce in 5 minutes?
52. How many widgets does ONE machine produce in 100 minutes?
63. How many machines do you need to produce 100 widgets in 100 minutes?
7
8Work through this step by step."""1verification_prompt = """You said 20 machines are needed.
2
3Before confirming, let's think about this carefully:
41. How many widgets does ONE machine produce in 5 minutes?
52. How many widgets does ONE machine produce in 100 minutes?
63. How many machines do you need to produce 100 widgets in 100 minutes?
7
8Work through this step by step."""The verification reveals:
1Let me reconsider:
2
31. If 5 machines produce 5 widgets in 5 minutes, then each machine produces 1 widget in 5 minutes.
4
52. In 100 minutes, one machine produces: 100 รท 5 = 20 widgets
6
73. To produce 100 widgets in 100 minutes: 100 widgets รท 20 widgets per machine = 5 machines
8
9Wait, I made an error. The answer is 5 machines, not 20.
10
11Each machine produces at a rate of 1 widget per 5 minutes. In 100 minutes, each machine makes 20 widgets. So 5 machines make 100 widgets in 100 minutes.
12
13Corrected answer: 5 machines.1Let me reconsider:
2
31. If 5 machines produce 5 widgets in 5 minutes, then each machine produces 1 widget in 5 minutes.
4
52. In 100 minutes, one machine produces: 100 รท 5 = 20 widgets
6
73. To produce 100 widgets in 100 minutes: 100 widgets รท 20 widgets per machine = 5 machines
8
9Wait, I made an error. The answer is 5 machines, not 20.
10
11Each machine produces at a rate of 1 widget per 5 minutes. In 100 minutes, each machine makes 20 widgets. So 5 machines make 100 widgets in 100 minutes.
12
13Corrected answer: 5 machines.The verification caught the mistake. By breaking down the problem into smaller questions, the agent reconsidered its initial (incorrect) assumption.
Iterative Refinement
Sometimes one verification pass isn't enough. You can create a refinement loop where the agent repeatedly improves its answer.
Here's a pattern for iterative refinement:
1def refine_answer(client, problem, max_iterations=3):
2 """Iteratively refine an answer through multiple verification passes."""
3
4 # Initial solution
5 messages = [
6 {"role": "user", "content": f"{problem}\n\nSolve this step by step."}
7 ]
8
9 response = client.chat.completions.create(
10 model="gpt-4",
11 messages=messages
12 )
13
14 current_answer = response.choices[0].message.content
15 print(f"Initial answer:\n{current_answer}\n")
16
17 # Refinement loop
18 for i in range(max_iterations):
19 messages.append({"role": "assistant", "content": current_answer})
20 messages.append({
21 "role": "user",
22 "content": """Review your answer. Are there any errors or improvements you can make?
23 If your answer is correct and complete, say 'VERIFIED'.
24 Otherwise, provide an improved version."""
25 })
26
27 response = client.chat.completions.create(
28 model="gpt-4",
29 messages=messages
30 )
31
32 refinement = response.choices[0].message.content
33 print(f"Refinement {i+1}:\n{refinement}\n")
34
35 if "VERIFIED" in refinement.upper():
36 print("Answer verified!")
37 break
38
39 current_answer = refinement
40
41 return current_answer1def refine_answer(client, problem, max_iterations=3):
2 """Iteratively refine an answer through multiple verification passes."""
3
4 # Initial solution
5 messages = [
6 {"role": "user", "content": f"{problem}\n\nSolve this step by step."}
7 ]
8
9 response = client.chat.completions.create(
10 model="gpt-4",
11 messages=messages
12 )
13
14 current_answer = response.choices[0].message.content
15 print(f"Initial answer:\n{current_answer}\n")
16
17 # Refinement loop
18 for i in range(max_iterations):
19 messages.append({"role": "assistant", "content": current_answer})
20 messages.append({
21 "role": "user",
22 "content": """Review your answer. Are there any errors or improvements you can make?
23 If your answer is correct and complete, say 'VERIFIED'.
24 Otherwise, provide an improved version."""
25 })
26
27 response = client.chat.completions.create(
28 model="gpt-4",
29 messages=messages
30 )
31
32 refinement = response.choices[0].message.content
33 print(f"Refinement {i+1}:\n{refinement}\n")
34
35 if "VERIFIED" in refinement.upper():
36 print("Answer verified!")
37 break
38
39 current_answer = refinement
40
41 return current_answerThis pattern lets the agent improve its answer over multiple passes, catching progressively subtler issues.
For intermediate readers: This iterative refinement pattern is related to several advanced techniques in AI research. Self-consistency checking (running the same problem multiple times and comparing results) and self-critique (having the model evaluate its own outputs) are active research areas. The key insight is that language models can often recognize errors in reasoning when prompted appropriately, even if they made those errors initially. This works because the verification task is different from the generation task. During generation, the model is sampling from a probability distribution. During verification, it's evaluating a concrete proposal, which can activate different reasoning patterns. However, this isn't foolproof. Models can still miss errors or even introduce new ones during refinement. In production systems, you might combine self-verification with external checks (like running code, querying databases, or using specialized verification models).
Practical Applications
Let's apply these verification techniques to our personal assistant.
Scenario: Planning a Budget
Your assistant helps you plan monthly expenses. You want it to check its own calculations.
Example (OpenAI)
1from openai import OpenAI
2
3client = OpenAI(api_key="your-api-key-here")
4
5budget_problem = """I earn $4,000 per month after taxes. I want to:
6- Save 20% for retirement
7- Spend no more than 30% on rent
8- Allocate $400 for groceries
9- Set aside $200 for entertainment
10- Keep $150 for utilities
11
12How much will I have left for other expenses? Create a budget breakdown."""
13
14## Initial budget calculation
15response = client.chat.completions.create(
16 model="gpt-4",
17 messages=[
18 {"role": "user", "content": budget_problem}
19 ]
20)
21
22initial_budget = response.choices[0].message.content
23print("Initial budget:\n", initial_budget)
24
25## Verification with specific checks
26verification = client.chat.completions.create(
27 model="gpt-4",
28 messages=[
29 {"role": "user", "content": budget_problem},
30 {"role": "assistant", "content": initial_budget},
31 {"role": "user", "content": """Please verify your budget by:
321. Adding up all expenses to confirm they don't exceed $4,000
332. Checking that percentages are calculated correctly
343. Confirming the remaining amount is accurate
354. Noting if any category seems unrealistic
36
37Show your verification calculations."""}
38 ]
39)
40
41print("\nVerification:\n", verification.choices[0].message.content)1from openai import OpenAI
2
3client = OpenAI(api_key="your-api-key-here")
4
5budget_problem = """I earn $4,000 per month after taxes. I want to:
6- Save 20% for retirement
7- Spend no more than 30% on rent
8- Allocate $400 for groceries
9- Set aside $200 for entertainment
10- Keep $150 for utilities
11
12How much will I have left for other expenses? Create a budget breakdown."""
13
14## Initial budget calculation
15response = client.chat.completions.create(
16 model="gpt-4",
17 messages=[
18 {"role": "user", "content": budget_problem}
19 ]
20)
21
22initial_budget = response.choices[0].message.content
23print("Initial budget:\n", initial_budget)
24
25## Verification with specific checks
26verification = client.chat.completions.create(
27 model="gpt-4",
28 messages=[
29 {"role": "user", "content": budget_problem},
30 {"role": "assistant", "content": initial_budget},
31 {"role": "user", "content": """Please verify your budget by:
321. Adding up all expenses to confirm they don't exceed $4,000
332. Checking that percentages are calculated correctly
343. Confirming the remaining amount is accurate
354. Noting if any category seems unrealistic
36
37Show your verification calculations."""}
38 ]
39)
40
41print("\nVerification:\n", verification.choices[0].message.content)The verification might reveal:
1Let me verify the budget:
2
3Income: $4,000
4
5Expenses:
6- Retirement (20%): $4,000 ร 0.20 = $800 โ
7- Rent (30%): $4,000 ร 0.30 = $1,200 โ
8- Groceries: $400 โ
9- Entertainment: $200 โ
10- Utilities: $150 โ
11
12Total expenses: $800 + $1,200 + $400 + $200 + $150 = $2,750
13
14Remaining: $4,000 - $2,750 = $1,250 โ
15
16Verification complete. The budget is mathematically correct, and you'll have $1,250
17remaining for other expenses. All calculations check out.1Let me verify the budget:
2
3Income: $4,000
4
5Expenses:
6- Retirement (20%): $4,000 ร 0.20 = $800 โ
7- Rent (30%): $4,000 ร 0.30 = $1,200 โ
8- Groceries: $400 โ
9- Entertainment: $200 โ
10- Utilities: $150 โ
11
12Total expenses: $800 + $1,200 + $400 + $200 + $150 = $2,750
13
14Remaining: $4,000 - $2,750 = $1,250 โ
15
16Verification complete. The budget is mathematically correct, and you'll have $1,250
17remaining for other expenses. All calculations check out.This systematic verification ensures the budget is accurate before you rely on it.
Limitations and Considerations
While verification is powerful, it's not magic. Keep these limitations in mind:
Verification isn't perfect: The agent can still miss errors, especially subtle ones. It's checking its own work with the same reasoning capabilities that produced the initial answer.
It adds cost and latency: Each verification pass means another API call, which takes time and costs money. Use verification judiciously for important decisions, not every trivial query.
Over-verification can confuse: Asking the agent to verify too many times might lead it to second-guess correct answers or introduce new errors.
Some errors are hard to catch: If the agent fundamentally misunderstands the problem, verification might not help. It will just verify the wrong approach more confidently.
Think of verification as a safety net, not a guarantee. It significantly improves reliability, but it doesn't eliminate the need for human oversight on important decisions.
When to Use Verification
Use verification strategically:
High-stakes decisions: When the cost of an error is high (financial calculations, medical information, legal advice), always verify.
Complex reasoning: Multi-step problems with many opportunities for errors benefit from verification.
Unfamiliar domains: When the agent is working in an area where it might lack knowledge, verification helps catch knowledge gaps.
User-facing outputs: Before presenting an answer to a user, especially in professional contexts, verification adds polish.
Skip verification for: Simple queries, creative tasks where there's no "right" answer, or when speed matters more than perfect accuracy.
Combining Verification with Chain-of-Thought
Verification works even better when combined with chain-of-thought reasoning from the previous chapter. Here's the pattern:
- Think step by step (chain-of-thought): Break down the problem
- Solve: Work through each step
- Verify: Check the reasoning and calculations
- Refine: Correct any errors found
This three-stage process (think, solve, verify) creates a robust reasoning pipeline for your agent.
Example (Claude Agent SDK)
1from anthropic import Anthropic
2
3client = Anthropic(api_key="your-api-key-here")
4
5problem = """A rectangular garden is 12 meters long and 8 meters wide.
6You want to build a path 1 meter wide around the entire garden.
7What is the area of the path?"""
8
9## Stage 1 & 2: Chain-of-thought solving
10response = client.messages.create(
11 model="claude-3-5-sonnet-20241022",
12 max_tokens=1024,
13 messages=[
14 {"role": "user", "content": f"""{problem}
15
16Think through this step by step:
171. What are the dimensions of the garden?
182. What will be the dimensions including the path?
193. How can you calculate the path area?
20
21Solve the problem showing your work."""}
22 ]
23)
24
25solution = response.content[0].text
26print("Solution:", solution)
27
28## Stage 3: Verification
29verification = client.messages.create(
30 model="claude-3-5-sonnet-20241022",
31 max_tokens=1024,
32 messages=[
33 {"role": "user", "content": f"""{problem}
34
35Your solution:
36{solution}
37
38Verify by:
391. Checking dimensions are correct
402. Confirming area calculations
413. Ensuring you calculated the path area, not the total area
42
43If you find errors, provide a corrected solution."""}
44 ]
45)
46
47print("\nVerification:", verification.content[0].text)1from anthropic import Anthropic
2
3client = Anthropic(api_key="your-api-key-here")
4
5problem = """A rectangular garden is 12 meters long and 8 meters wide.
6You want to build a path 1 meter wide around the entire garden.
7What is the area of the path?"""
8
9## Stage 1 & 2: Chain-of-thought solving
10response = client.messages.create(
11 model="claude-3-5-sonnet-20241022",
12 max_tokens=1024,
13 messages=[
14 {"role": "user", "content": f"""{problem}
15
16Think through this step by step:
171. What are the dimensions of the garden?
182. What will be the dimensions including the path?
193. How can you calculate the path area?
20
21Solve the problem showing your work."""}
22 ]
23)
24
25solution = response.content[0].text
26print("Solution:", solution)
27
28## Stage 3: Verification
29verification = client.messages.create(
30 model="claude-3-5-sonnet-20241022",
31 max_tokens=1024,
32 messages=[
33 {"role": "user", "content": f"""{problem}
34
35Your solution:
36{solution}
37
38Verify by:
391. Checking dimensions are correct
402. Confirming area calculations
413. Ensuring you calculated the path area, not the total area
42
43If you find errors, provide a corrected solution."""}
44 ]
45)
46
47print("\nVerification:", verification.content[0].text)This combined approach gives you both the benefits of structured thinking and the safety of verification.
Building Verification Into Your Agent
As you develop your personal assistant, consider building verification into its core workflow for critical tasks. Here's a simple pattern:
1class PersonalAssistant:
2 def __init__(self, client):
3 self.client = client
4
5 def solve_with_verification(self, problem, verify=True):
6 """Solve a problem with optional verification."""
7
8 # Initial solution
9 solution = self._solve(problem)
10
11 if not verify:
12 return solution
13
14 # Verification step
15 verified = self._verify(problem, solution)
16
17 return verified
18
19 def _solve(self, problem):
20 """Generate initial solution."""
21 response = self.client.chat.completions.create(
22 model="gpt-4",
23 messages=[
24 {"role": "user", "content": f"{problem}\n\nSolve step by step."}
25 ]
26 )
27 return response.choices[0].message.content
28
29 def _verify(self, problem, solution):
30 """Verify and potentially refine the solution."""
31 response = self.client.chat.completions.create(
32 model="gpt-4",
33 messages=[
34 {"role": "user", "content": f"""Problem: {problem}
35
36Solution: {solution}
37
38Verify this solution. If correct, return it as-is.
39If you find errors, return a corrected version."""}
40 ]
41 )
42 return response.choices[0].message.content1class PersonalAssistant:
2 def __init__(self, client):
3 self.client = client
4
5 def solve_with_verification(self, problem, verify=True):
6 """Solve a problem with optional verification."""
7
8 # Initial solution
9 solution = self._solve(problem)
10
11 if not verify:
12 return solution
13
14 # Verification step
15 verified = self._verify(problem, solution)
16
17 return verified
18
19 def _solve(self, problem):
20 """Generate initial solution."""
21 response = self.client.chat.completions.create(
22 model="gpt-4",
23 messages=[
24 {"role": "user", "content": f"{problem}\n\nSolve step by step."}
25 ]
26 )
27 return response.choices[0].message.content
28
29 def _verify(self, problem, solution):
30 """Verify and potentially refine the solution."""
31 response = self.client.chat.completions.create(
32 model="gpt-4",
33 messages=[
34 {"role": "user", "content": f"""Problem: {problem}
35
36Solution: {solution}
37
38Verify this solution. If correct, return it as-is.
39If you find errors, return a corrected version."""}
40 ]
41 )
42 return response.choices[0].message.contentThis pattern makes verification easy to enable or disable based on the task's importance.
Key Takeaways
- Verification improves accuracy: Prompting agents to check their work catches many errors
- Multiple techniques exist: Confirmation, alternative approaches, explanation, and constraint checking all help
- Iterative refinement: Multiple verification passes can progressively improve answers
- Combine with chain-of-thought: Verification works best alongside structured reasoning
- Use strategically: Apply verification to high-stakes or complex problems, not every query
- Not foolproof: Verification helps but doesn't guarantee correctness
With verification techniques in your toolkit, your agent becomes more reliable and trustworthy. It doesn't just solve problems; it double-checks its work, catching errors before they reach you.
Quiz
Ready to test your understanding? Take this quick quiz to reinforce what you've learned about checking and refining agent reasoning.
Reference

About the author: Michael Brenndoerfer
All opinions expressed here are my own and do not reflect the views of my employer.
Michael currently works as an Associate Director of Data Science at EQT Partners in Singapore, where he drives AI and data initiatives across private capital investments.
With over a decade of experience spanning private equity, management consulting, and software engineering, he specializes in building and scaling analytics capabilities from the ground up. He has published research in leading AI conferences and holds expertise in machine learning, natural language processing, and value creation through data.
Related Content

Adding a Calculator Tool to Your AI Agent: Complete Implementation Guide
Build a working calculator tool for your AI agent from scratch. Learn the complete workflow from Python function to tool integration, with error handling and testing examples.

Using a Language Model in Code: Complete Guide to API Integration & Implementation
Learn how to call language models from Python code, including GPT-5, Claude Sonnet 4.5, and Gemini 2.5. Master API integration, error handling, and building reusable functions for AI agents.

DBSCAN Clustering: Complete Guide to Density-Based Spatial Clustering with Noise Detection
Master DBSCAN clustering for finding arbitrary-shaped clusters and detecting outliers. Learn density-based clustering, parameter tuning, and implementation with scikit-learn.
Stay updated
Get notified when I publish new articles on data and AI, private equity, technology, and more.

