Skip to content

Guardrails (Tripwires / Validators)

What Problem It Solves

Policies answer “is this tool call allowed?”. Guardrails answer “is the system behaving safely and correctly right now?”.

Guardrails are small, composable checks that can:

  • Validate tool arguments (schema/rules).
  • Detect prompt injection / unsafe instructions.
  • Enforce “must cite evidence” style constraints.
  • Block / rewrite / escalate when something looks wrong.

When to Use

  • You have retrieval sources you don’t fully trust.
  • You must enforce invariants (no secrets, no network, only whitelisted domains, etc.).
  • You want defense-in-depth beyond a static allowlist.

Core Flow

flowchart TD
  S["Loop step"] --> C1["Guardrail checks (pre-tool)"]
  C1 -->|ok| T["Tool call"]
  C1 -->|blocked| F["Fallback / Escalate / Abort"]
  T --> O["Observation"]
  O --> C2["Guardrail checks (post-tool)"]
  C2 -->|ok| N["Next step"]
  C2 -->|blocked| F

Evolution Path

  • Built on: Policy + Loop controller + Tracing
  • Often paired with:
  • HITL (approval when guardrail trips)
  • Maker-Checker / CoVe (verification as a reliability guardrail)

Repo Reference

  • Code: src/agent_patterns_lab/runtime/guardrails.py
  • Example: examples/66_governance_hitl_policy_guardrails.py
  • Tests: tests/test_guardrails.py