Learn how to implement action restrictions and permissions for AI agents using the principle of least privilege, confirmation steps, and sandboxing to keep your agent powerful but safe.

This article is part of the free-to-read AI Agent Handbook
Action Restrictions and Permissions
In the previous chapter, you learned how to keep your agent's outputs safe through content moderation. But what about its actions? When your agent can send emails, modify files, or make API calls, filtering text isn't enough. You need to control what it's allowed to do in the first place.
Think about how permissions work on your computer. When you install an app, it asks for specific permissions: access to your camera, your files, your location. The app doesn't get unlimited power. It gets exactly what it needs to do its job, and no more. Your AI agent should work the same way.
This chapter explores how to implement action restrictions and permissions for our personal assistant. You'll learn the principle of least privilege (giving the agent only the access it needs), how to add confirmation steps for risky actions, and how to sandbox the agent's environment. By the end, you'll have an agent that's powerful but constrained, capable but safe.
The Problem with Unrestricted Actions
Let's start by understanding what can go wrong. Imagine you've given your assistant the ability to send emails on your behalf. Here's a simple tool implementation:
This works, but it's dangerous. The agent can now send any email to anyone, at any time, without asking. What if it misunderstands a request? What if a user tries to trick it into spamming someone? What if there's a bug in your code that causes it to send the same email repeatedly?
These aren't hypothetical concerns. When you give an agent the power to take actions in the real world, you need safeguards. Let's explore how to add them.
Principle of Least Privilege
The first rule of action safety is simple: give your agent the minimum permissions it needs to do its job. This is called the principle of least privilege, and it's a fundamental concept in security.
Let's apply this to our email example. Instead of letting the agent email anyone, we might restrict it to only emailing people in your contacts list, or only emailing specific domains. Here's how:
Now the agent can only email specific people or specific domains. If it tries to email someone else, the function returns an error. This is a hard constraint that the agent can't bypass, no matter what the user asks for.
You can apply the same principle to other tools:
File access: Instead of giving the agent access to your entire filesystem, restrict it to a specific directory:
API calls: Limit which APIs the agent can call and what operations it can perform:
The pattern is consistent: before the tool does anything, it checks whether the action is allowed. If not, it returns an error. The agent sees the error and can explain to the user why the action wasn't possible.
Confirmation Steps for Risky Actions
Some actions are too risky to perform automatically, even if they're technically allowed. For these, you want the agent to ask for confirmation first.
Think about how your smartphone handles this. When an app wants to access your camera, it doesn't just do it. It asks: "Allow this app to access your camera?" You have to explicitly approve.
Your agent should do the same for high-stakes actions. Here's how to implement confirmation for email sending:
Let's see this in action:
The agent prepares the action but doesn't execute it. It shows you exactly what it's about to do and waits for your approval. Only when you say "yes" does the email actually get sent.
This pattern works for any risky action:
- Deleting files: Show which files will be deleted
- Making purchases: Show the item and price
- Posting to social media: Show the exact content to be posted
- Modifying data: Show what will change
The key is to make the confirmation specific. Don't just ask "Is this okay?" Show exactly what will happen so the user can make an informed decision.
Sandboxing the Agent's Environment
Even with restrictions and confirmations, you might want an extra layer of protection: running the agent in a sandbox. A sandbox is a restricted environment where the agent can operate without affecting the rest of your system.
Think of it like a playground with a fence. The agent can do whatever it wants inside the sandbox, but it can't get out and affect anything beyond the fence.
Here's a simple example using Docker to sandbox file operations:
This sandbox provides several protections:
Isolated filesystem: The agent can only access files in the sandbox directory. It can't read or modify anything else on your system.
No network access: The agent can't make network requests, preventing it from sending data to external servers.
Resource limits: The agent gets limited CPU and memory, preventing it from consuming all your system resources.
Automatic cleanup: When you're done, you can delete the entire sandbox, removing any files the agent created.
For most personal assistants, full Docker sandboxing might be overkill. But the principle is valuable: isolate risky operations so they can't affect the rest of your system.
A lighter-weight approach is to use Python's built-in restrictions:
This approach removes dangerous built-ins like open, eval, and __import__, preventing the agent from accessing files or importing modules. It's not as secure as Docker, but it's much simpler and works for many use cases.
Designing a Permission System
As your agent grows more capable, you'll want a more structured approach to permissions. Instead of hardcoding restrictions in each tool, you can create a permission system that manages what the agent can do.
Here's a simple permission system:
Now you can create agents with different permission levels:
This system gives you several benefits:
Centralized control: All permission logic is in one place, making it easy to audit and modify.
Clear permissions: You can see at a glance what each agent is allowed to do.
Audit trail: The permission log shows every action the agent attempted and whether it was allowed.
Flexible configuration: You can easily create agents with different permission levels for different use cases.
Combining Restrictions, Confirmations, and Permissions
The most robust approach combines all three techniques:
- Permissions define what the agent is allowed to do in principle
- Restrictions limit the scope of allowed actions (which files, which recipients, etc.)
- Confirmations require user approval for high-stakes actions
Here's how they work together:
This gives you defense in depth. If one layer fails, the others provide backup protection:
- If the agent somehow bypasses the permission check, the restriction will still block unauthorized recipients
- If the restriction is misconfigured, the confirmation gives the user a chance to catch the error
- If the user accidentally confirms, the permission log provides an audit trail
Practical Guidelines for Action Safety
As you implement action restrictions for your own agent, keep these guidelines in mind:
Start with the minimum: When adding a new tool, give it the most restrictive permissions possible. You can always loosen restrictions later, but it's harder to tighten them once users expect certain capabilities.
Make restrictions visible: When the agent can't do something, make sure it explains why. A message like "I don't have permission to delete files" is much better than a generic error.
Log everything: Keep a record of what actions the agent attempted, which were allowed, and which were blocked. This helps you understand how the agent is being used and whether your restrictions are too tight or too loose.
Test adversarially: Try to trick your agent into doing things it shouldn't. Ask it to email someone outside the allowed list. Try to make it access files outside its sandbox. See where the weaknesses are.
Layer your defenses: Don't rely on a single protection mechanism. Use permissions, restrictions, confirmations, and sandboxing together.
Consider the context: A personal assistant running on your laptop might need different restrictions than one deployed as a service for multiple users. Adjust your safety measures to match the risk level.
When to Use Each Technique
Different situations call for different approaches:
Use permissions when you want to completely disable certain capabilities. If your agent should never send emails, don't give it the permission.
Use restrictions when you want to limit the scope of allowed actions. The agent can send emails, but only to certain people.
Use confirmations when actions are risky but sometimes necessary. The agent can delete files, but only after you approve each deletion.
Use sandboxing when you need strong isolation. If your agent runs untrusted code or processes user-uploaded files, put it in a sandbox.
For our personal assistant, here's a reasonable configuration:
- Permissions: Read files, send emails, make API calls (no delete, no system commands)
- Restrictions: Only read from designated folders, only email known contacts
- Confirmations: Required for sending emails, making purchases, posting publicly
- Sandboxing: Not needed for basic assistant tasks, but useful if adding code execution
This gives the agent enough power to be useful while keeping risks manageable.
Glossary
Action Restriction: A limit on what an agent can do with a given capability, such as restricting file access to a specific directory or email sending to approved recipients.
Confirmation Step: A safety mechanism where the agent requests explicit user approval before executing a potentially risky action, showing exactly what will happen.
Least Privilege: The security principle of granting an agent only the minimum permissions necessary to perform its intended function, reducing potential harm from errors or misuse.
Permission: An authorization that determines whether an agent is allowed to perform a specific type of action, such as reading files or sending emails.
Permission System: A structured framework for managing and checking what actions an agent is allowed to perform, typically including permission definitions, checks, and audit logging.
Sandbox: An isolated environment where an agent can operate without affecting the broader system, typically with restricted access to files, network, and system resources.
Quiz
Ready to test your understanding? Take this quick quiz to reinforce what you've learned about action restrictions and permissions for AI agents.





Comments