Policy: Say What the Agent Is Not Allowed to Do
Once an agent can call tools, it is no longer just chatting. It may send email, query databases, edit files, or place orders. At that point, the first job is not making it smarter. The first job is drawing boundaries.
Policy answers:
Is this tool call allowed, and are these arguments allowed?
It should not be only a prompt that says “be careful.” It should be a Python check before execution.
What It Fixes
Without policy, a travel assistant can slide from “recommend a route” into “book a ticket,” “cancel an order,” or “send passport data to a third party.” The model may be trying to help and still cross a boundary.
Policy turns boundaries into rules:
- which tools are allowed
- which tools are never allowed
- which arguments are required
- which values are in range
- which tools depend on environment or user permission
Flow
flowchart TD
A["Agent proposes tool call"] --> P["Policy checks tool name and arguments"]
P -->|allowed| T["Python executes tool"]
T --> O["Tool result"]
P -->|rejected| B["Return violation reason"]
B --> R["Agent changes plan or asks user"]
Minimal Code Shape
Allow only book_ticket for standard Hangzhou tickets:
allowed_tools = {"book_ticket"}
def check_tool_call(name: str, args: dict) -> None:
if name not in allowed_tools:
raise PermissionError(f"tool not allowed: {name}")
if args.get("city") != "Hangzhou":
raise PermissionError("only Hangzhou bookings are allowed")
if args.get("ticket_type") not in {"standard"}:
raise PermissionError("only standard tickets are allowed")
Check before execution:
check_tool_call("book_ticket", {"city": "Hangzhou", "ticket_type": "standard"})
The code is plain, but it separates responsibility: the model can propose; Python decides whether execution is allowed.
Use It When
- Tools affect the real world: payments, bookings, email, file deletion.
- Different users have different permissions.
- Tools have cost or rate limits.
- You need an audit trail for why a call was allowed or rejected.
Avoid It When
If the system only drafts text and has no tools, policy can wait.
But once real tools exist, start with at least an allowlist. Do not let the model define its own boundary.
Common Failure Modes
| Mistake | Result | Fix |
|---|---|---|
| Only writing “do not overreach” in the prompt | The model can forget or be manipulated | Check before execution in Python |
Granting * permissions |
Fast debugging, dangerous production | Start from an allowlist |
| Checking tool name but not arguments | Right tool, unsafe values | Add per-tool argument rules |
| Not logging rejections | Hard to debug later | Log tool, args, and reason |
Next
Policy controls what is allowed. Some failures are not permission failures: a tool result may contain prompt injection, or the final answer may leak sensitive data.
For runtime checks, read Guardrails.