Skip to content

Policy: Say What the Agent Is Not Allowed to Do

Once an agent can call tools, it is no longer just chatting. It may send email, query databases, edit files, or place orders. At that point, the first job is not making it smarter. The first job is drawing boundaries.

Policy answers:

Is this tool call allowed, and are these arguments allowed?

It should not be only a prompt that says “be careful.” It should be a Python check before execution.

What It Fixes

Without policy, a travel assistant can slide from “recommend a route” into “book a ticket,” “cancel an order,” or “send passport data to a third party.” The model may be trying to help and still cross a boundary.

Policy turns boundaries into rules:

  • which tools are allowed
  • which tools are never allowed
  • which arguments are required
  • which values are in range
  • which tools depend on environment or user permission

Flow

flowchart TD
  A["Agent proposes tool call"] --> P["Policy checks tool name and arguments"]
  P -->|allowed| T["Python executes tool"]
  T --> O["Tool result"]
  P -->|rejected| B["Return violation reason"]
  B --> R["Agent changes plan or asks user"]

Minimal Code Shape

Allow only book_ticket for standard Hangzhou tickets:

allowed_tools = {"book_ticket"}

def check_tool_call(name: str, args: dict) -> None:
    if name not in allowed_tools:
        raise PermissionError(f"tool not allowed: {name}")
    if args.get("city") != "Hangzhou":
        raise PermissionError("only Hangzhou bookings are allowed")
    if args.get("ticket_type") not in {"standard"}:
        raise PermissionError("only standard tickets are allowed")

Check before execution:

check_tool_call("book_ticket", {"city": "Hangzhou", "ticket_type": "standard"})

The code is plain, but it separates responsibility: the model can propose; Python decides whether execution is allowed.

Use It When

  • Tools affect the real world: payments, bookings, email, file deletion.
  • Different users have different permissions.
  • Tools have cost or rate limits.
  • You need an audit trail for why a call was allowed or rejected.

Avoid It When

If the system only drafts text and has no tools, policy can wait.

But once real tools exist, start with at least an allowlist. Do not let the model define its own boundary.

Common Failure Modes

Mistake Result Fix
Only writing “do not overreach” in the prompt The model can forget or be manipulated Check before execution in Python
Granting * permissions Fast debugging, dangerous production Start from an allowlist
Checking tool name but not arguments Right tool, unsafe values Add per-tool argument rules
Not logging rejections Hard to debug later Log tool, args, and reason

Next

Policy controls what is allowed. Some failures are not permission failures: a tool result may contain prompt injection, or the final answer may leak sensitive data.

For runtime checks, read Guardrails.