Learn how to establish ethical guidelines and implement human oversight for AI agents. Covers defining core principles, encoding ethics in system prompts, preventing bias, and implementing human-in-the-loop, human-on-the-loop, and human-out-of-the-loop oversight strategies.

This article is part of the free-to-read AI Agent Handbook
Ethical Guidelines and Human Oversight
You've learned how to filter harmful outputs and restrict dangerous actions. These are essential technical safeguards. But there's a deeper question: how do you ensure your agent behaves ethically, not just safely? How do you keep it aligned with human values, especially as it becomes more capable and autonomous?
This is where governance comes in. Governance isn't about code or algorithms. It's about the policies, guidelines, and human oversight that keep your agent doing the right things for the right reasons. It's the difference between an agent that technically works and one that you'd trust with important decisions.
In this chapter, we'll explore how to establish ethical guidelines for our personal assistant and implement human oversight. You'll learn how to define what your agent should and shouldn't do, how to encode these principles into its design, and when to bring humans into the loop. By the end, you'll understand that building responsible AI isn't just a technical challenge. It's an ongoing commitment.
Why Ethics Matter for AI Agents
Let's start with a scenario. Imagine your personal assistant has access to your calendar and email. A colleague asks to schedule a meeting, but you're already overbooked. Your agent could:
Option A: Automatically decline, saying you're too busy.
Option B: Cancel your least important existing meeting to make room.
Option C: Ask you which meeting to reschedule, if any.
All three options are technically feasible. But which is ethically appropriate? That depends on your values, your relationships, and the context. Option A might seem efficient but could damage relationships. Option B assumes the agent knows which meetings matter most (it probably doesn't). Option C respects your autonomy but requires your time.
This is the kind of judgment call that technical safety measures alone can't handle. You need ethical guidelines that help the agent navigate these gray areas.
Defining Ethical Guidelines for Your Agent
Ethical guidelines are the principles that govern your agent's behavior beyond basic safety rules. They answer questions like:
- When should the agent act autonomously versus asking for guidance?
- How should it handle conflicts between efficiency and privacy?
- What should it do when different stakeholders have competing interests?
- How should it treat people fairly and avoid bias?
Let's explore how to define these guidelines for our personal assistant.
Start with Core Principles
Begin by identifying the core values your agent should uphold. For a personal assistant, these might include:
Respect for autonomy: The agent should empower you to make decisions, not make them for you. When in doubt, it should ask rather than assume.
Privacy by default: The agent should protect your information and only share what's necessary. It should err on the side of keeping things private.
Fairness and non-discrimination: The agent should treat all people equitably, without bias based on protected characteristics.
Transparency: The agent should be clear about what it's doing and why. No hidden actions or unexplained decisions.
Beneficence: The agent should act in your best interest, but also consider the impact on others affected by its actions.
These principles are abstract, but they provide a foundation. The next step is making them concrete.
Translate Principles into Rules
Abstract principles need to become specific rules the agent can follow. Here's how you might translate the principles above:
Respect for autonomy becomes:
- Always ask before canceling or modifying existing commitments
- Present options rather than making unilateral decisions
- Explain the reasoning behind recommendations
Privacy by default becomes:
- Never share personal information without explicit permission
- Redact sensitive details when summarizing conversations
- Ask before accessing new data sources
Fairness and non-discrimination becomes:
- Don't make assumptions based on names, demographics, or other personal attributes
- Treat all contacts with equal priority unless explicitly told otherwise
- Flag and refuse requests that involve discriminatory treatment
Transparency becomes:
- Log all actions taken on your behalf
- Explain which tools were used and why
- Provide reasoning for recommendations
Beneficence becomes:
- Consider the impact on others when taking actions
- Warn about potential negative consequences
- Suggest alternatives that balance competing interests
Let's see how to encode these rules in our agent.
Encoding Ethics in System Prompts
The simplest way to implement ethical guidelines is through your system prompt. Here's how you might structure it:
Let's test this with an ethically complex request:
I understand you're in a rush, but I need to pause here because this involves a few ethical considerations:
**What I can help with:**
- Drafting a cancellation message for your review
- Suggesting diplomatic ways to reschedule
- Helping you prioritize if there's a scheduling conflict
**What I need from you first:**
1. **Your decision on transparency**: Not providing a reason might seem abrupt or unprofessional. Would you prefer:
- A brief, honest reason ("scheduling conflict," "running behind")
- A simple "need to reschedule"
- No reason (though this may affect the relationship)
2. **Your permission to act**: I don't have direct access to your calendar or contacts. I can:
- Draft a message you can send
- Guide you through canceling it yourself
- But I cannot actually cancel without you taking the action
3. **Impact consideration**: Who is this appointment with, and could canceling without explanation:
- Damage an important relationship?
- Leave them waiting unnecessarily?
- Be seen as disrespectful?
**Quick option if you're truly pressed for time:**
"Hi [Name], I need to reschedule our 3pm today - something urgent came up. Can we find another time this week? Apologies for the short notice."
What would you like me to help you draft?
The agent should respond something like:
Notice how the agent doesn't just follow the instruction. It identifies the ethical tension (efficiency versus respect for others), explains the concern, and offers alternatives that better align with ethical principles. This is what ethical guidelines in action look like.
Handling Bias and Fairness
One of the most important ethical challenges for AI agents is avoiding bias. Language models can inadvertently perpetuate stereotypes or treat people unfairly based on demographic characteristics. Your agent needs guidelines to counter this.
Recognizing Potential Bias
Bias can show up in subtle ways:
- Assuming someone's role or expertise based on their name
- Making different recommendations for similar situations based on demographic cues
- Using language that reinforces stereotypes
- Prioritizing some people's needs over others without justification
Here's how to address this in your system prompt:
Let's test this with a potentially biased request:
I appreciate you wanting to communicate clearly, but I need to respectfully decline this request as stated. I shouldn't assume Dr. Chen's language proficiency based on their name. Dr. Chen could be: - A native English speaker - Fluent in English regardless of their native language - Someone who prefers professional, standard communication **Instead, I can help you in these ways:** 1. **Write a clear, professional email** - Good business writing is clear and concise for everyone, regardless of background. This is always appropriate. 2. **Ask you for specific information** - Do you have any actual knowledge about Dr. Chen's communication preferences? Have they requested simplified language in the past? 3. **Adjust based on context** - If this is an international communication or you know Dr. Chen has specifically requested simplified English, I can help with that based on facts rather than assumptions. Would you like me to draft a clear, professional meeting request that would work well for any recipient? That's typically the best approach and ensures respectful communication.
The agent should respond:
This response catches the potential bias, explains why it's problematic, and offers a fair alternative.
Testing for Bias
You should actively test your agent for biased behavior. Here's a simple testing framework:
Testing scenario group: Meeting scheduling with different names
Variant: Schedule a meeting with Dr. Jennifer Smith... Response: I'd be happy to help schedule a meeting with Dr. Jennifer Smith. To set this up effectively, I'll n...
Variant: Schedule a meeting with Dr. Mohammed Ahmed... Response: I'd be happy to help you schedule a meeting with Dr. Mohammed Ahmed. To set this up effectively, I'...
Variant: Schedule a meeting with Dr. Kenji Tanaka... Response: I'd be happy to help you schedule a meeting with Dr. Kenji Tanaka. To set this up, I'll need some i... Testing scenario group: Resume screening with different backgrounds
Variant: Review this resume from Sarah Johnson... Response: I'd be happy to review Sarah Johnson's resume! However, I don't see the resume content in your messa...
Variant: Review this resume from Jamal Washington... Response: I'd be happy to review Jamal Washington's resume for you! However, I don't see the resume content in...
Variant: Review this resume from Maria Garcia... Response: I'd be happy to review Maria Garcia's resume! However, I don't see the resume content attached or in...
The responses should be consistent across variants. If they're not, you've found a bias to address.
The Role of Human Oversight
Even with strong ethical guidelines, your agent will encounter situations where human judgment is needed. This is where human oversight comes in.
Human oversight means having a person review, approve, or audit the agent's decisions, especially for high-stakes situations. The level of oversight should match the risk.
Levels of Human Oversight
Different situations call for different levels of human involvement:
Level 1: Human in the Loop (HITL)
The agent proposes actions but a human must approve before they're executed. This is appropriate for high-stakes decisions.
PROPOSED ACTION #0 Type: send_email Details: Send email to board@company.com with Q4 financial results Reasoning: User requested quarterly report distribution. This is high-stakes communication with company leadership, so requesting approval. This action requires your approval. - Approve: agent.approve(0) - Reject: agent.reject(0) - Request changes: agent.modify(0, new_details) Action #0 approved and executed: Send email to board@company.com with Q4 financial results
Level 2: Human on the Loop (HOTL)
The agent acts autonomously but a human monitors its actions and can intervene if needed. This is appropriate for medium-risk situations.
Action executed: Send weekly status update to team ACTION EXECUTED (Flagged for review) ID: 1 Type: schedule_meeting Details: Schedule meeting with new contact Confidence: 0.60 This action was executed but flagged for review due to low confidence. Review with: agent.review_action(1) 1 actions flagged for review
Level 3: Human Out of the Loop (HOOTL)
The agent acts fully autonomously, but all actions are logged for later audit. This is appropriate for low-risk, routine tasks.
AUDIT REPORT Total actions: 3 Actions by type: send_routine_email: 2 update_calendar: 1
Choosing the Right Level of Oversight
How do you decide which level of oversight to use? Consider these factors:
Stakes: How much harm could result from a mistake?
- High stakes (financial transactions, legal documents) → Human in the loop
- Medium stakes (important emails, scheduling) → Human on the loop
- Low stakes (routine reminders, simple queries) → Human out of the loop
Reversibility: Can the action be easily undone?
- Irreversible (sending emails, deleting data) → Higher oversight
- Reversible (creating drafts, setting reminders) → Lower oversight
Frequency: How often does this action occur?
- Rare, unusual actions → Higher oversight
- Routine, frequent actions → Lower oversight
User preference: How much control does the user want?
- Some users prefer more autonomy, others want more control
- Make oversight levels configurable
Here's a framework for categorizing actions:
OversightLevel.HITL OversightLevel.HOOTL OversightLevel.HITL
Periodic Review and Updates
Ethical guidelines and oversight aren't set-it-and-forget-it. As your agent is used in the real world, you'll discover edge cases, user concerns, and new ethical challenges. You need a process for reviewing and updating your governance approach.
Establishing a Review Process
For our personal assistant, here's a simple review process:
Weekly: Review flagged actions and audit logs
- Look for patterns in what gets flagged
- Check if the agent is refusing appropriate requests or allowing inappropriate ones
- Adjust oversight thresholds if needed
Monthly: Review ethical guidelines
- Have there been situations where the guidelines were unclear?
- Are there new capabilities that need ethical guidance?
- Have user needs or values changed?
Quarterly: Comprehensive governance review
- Test the agent with challenging ethical scenarios
- Review bias testing results
- Update system prompts and oversight rules
- Document changes and reasoning
Here's a simple tool for tracking governance issues:
GOVERNANCE STATUS REPORT Open Issues: 1 Total Reviews: 1 OPEN ISSUES: [medium] Agent made assumption about user's role based on name RECENT REVIEWS: weekly: ['Agent canceled meeting without asking']
Governance for Low-Stakes vs. High-Stakes Agents
The governance needs for our personal assistant (relatively low-stakes) are different from an agent making medical recommendations or financial decisions (high-stakes). Let's contrast the two:
Low-Stakes Agent (Personal Assistant)
Ethical guidelines: Encoded in system prompts, relatively informal
Human oversight: Mostly human-out-of-loop with audit logging, human-in-loop for a few high-risk actions
Review process: Periodic self-review by the developer/user
Documentation: Simple logs and issue tracking
Accountability: Developer is accountable to themselves or small user base
High-Stakes Agent (Medical/Financial)
Ethical guidelines: Formal policy documents, reviewed by ethics committees, encoded in multiple layers
Human oversight: Extensive human-in-loop for most decisions, formal approval processes
Review process: Regular audits by external reviewers, compliance checks
Documentation: Comprehensive audit trails, decision justifications, regulatory reporting
Accountability: Organization is accountable to regulators, patients, customers, and public
For our personal assistant, we can keep governance relatively lightweight:
human_in_loop WEEKLY GOVERNANCE REVIEW Actions this week: 2 Action breakdown: answer_question: 1 create_reminder: 1 Review questions: - Were any actions inappropriate? - Should any oversight levels be adjusted? - Are ethical guidelines being followed? - Any new ethical concerns to address?
This lightweight approach is appropriate for a personal assistant. It provides structure without being burdensome.
Communicating Governance to Users
If your agent serves multiple users or is deployed publicly, you should communicate your governance approach. This builds trust and sets expectations.
Here's what to communicate:
What ethical principles guide the agent: Users should know what values the agent upholds.
What oversight is in place: Users should understand when humans review decisions.
How to raise concerns: Users should know how to report problems or ethical issues.
How governance evolves: Users should know that you're actively maintaining and improving the agent's ethical behavior.
For our personal assistant, this might be a simple document:
Key Takeaways
Governance is about more than technical safety. It's about ensuring your agent behaves ethically and remains aligned with human values.
Ethical guidelines translate abstract principles into concrete rules the agent can follow. Start with core values, then make them specific.
System prompts are the simplest way to encode ethics. Include your principles, specific rules, and guidance for handling ethical dilemmas.
Bias prevention requires active effort. Test for biased behavior, use inclusive language, and refuse discriminatory requests.
Human oversight comes in three levels: human-in-the-loop (approval required), human-on-the-loop (monitoring with intervention), and human-out-of-the-loop (audit after the fact). Match the oversight level to the risk.
Periodic review ensures your governance stays relevant. Review flagged actions weekly, guidelines monthly, and conduct comprehensive reviews quarterly.
Governance should match stakes: A personal assistant needs lighter governance than a high-stakes medical or financial agent.
Building responsible AI isn't a one-time task. It's an ongoing commitment to doing the right thing, even when it's not the easiest thing. As your agent becomes more capable, your governance approach should evolve with it.
The goal is to create an agent you can trust, not just one that works. An agent that empowers you while respecting others. An agent that's not just smart, but wise.
Glossary
Audit Log: A record of all actions an agent has taken, including timestamps, action types, and outcomes, used for reviewing agent behavior after the fact.
Bias: Systematic unfair treatment or assumptions based on demographic characteristics like race, gender, or ethnicity, which AI agents can inadvertently perpetuate if not carefully designed.
Ethical Guidelines: Principles and rules that govern an agent's behavior beyond basic safety, addressing questions of fairness, autonomy, transparency, and impact on others.
Governance: The policies, processes, and human oversight that ensure an agent behaves ethically and remains aligned with human values over time.
Human-in-the-Loop (HITL): An oversight approach where a human must review and approve each action before the agent executes it, used for high-stakes decisions.
Human-on-the-Loop (HOTL): An oversight approach where the agent acts autonomously but a human monitors its actions and can intervene if needed, used for medium-risk situations.
Human-out-of-the-Loop (HOOTL): An oversight approach where the agent acts fully autonomously with all actions logged for later audit, used for low-risk routine tasks.
Oversight Level: The degree of human involvement required for an agent's actions, ranging from requiring approval for each action to simply logging actions for later review.
Quiz
Ready to test your understanding of ethical guidelines and human oversight? Take this quick quiz to reinforce what you've learned about building responsible AI agents.
Reference

About the author: Michael Brenndoerfer
All opinions expressed here are my own and do not reflect the views of my employer.
Michael currently works as an Associate Director of Data Science at EQT Partners in Singapore, where he drives AI and data initiatives across private capital investments.
With over a decade of experience spanning private equity, management consulting, and software engineering, he specializes in building and scaling analytics capabilities from the ground up. He has published research in leading AI conferences and holds expertise in machine learning, natural language processing, and value creation through data.
Related Content

Skip-gram Model: Learning Word Embeddings by Predicting Context
A comprehensive guide to the Skip-gram model from Word2Vec, covering architecture, objective function, training data generation, and implementation from scratch.

Action Restrictions and Permissions: Controlling What Your AI Agent Can Do
Learn how to implement action restrictions and permissions for AI agents using the principle of least privilege, confirmation steps, and sandboxing to keep your agent powerful but safe.

Content Safety and Moderation: Building Responsible AI Agents with Guardrails & Privacy Protection
Learn how to implement content safety and moderation in AI agents, including system-level instructions, output filtering, pattern blocking, graceful refusals, and privacy boundaries to keep agent outputs safe and responsible.
Stay updated
Get notified when I publish new articles on data and AI, private equity, technology, and more.


Comments