Guardrails (Tripwires / Validators)
What Problem It Solves
Policies answer “is this tool call allowed?”. Guardrails answer “is the system behaving safely and correctly right now?”.
Guardrails are small, composable checks that can:
- Validate tool arguments (schema/rules).
- Detect prompt injection / unsafe instructions.
- Enforce “must cite evidence” style constraints.
- Block / rewrite / escalate when something looks wrong.
When to Use
- You have retrieval sources you don’t fully trust.
- You must enforce invariants (no secrets, no network, only whitelisted domains, etc.).
- You want defense-in-depth beyond a static allowlist.
Core Flow
flowchart TD
S["Loop step"] --> C1["Guardrail checks (pre-tool)"]
C1 -->|ok| T["Tool call"]
C1 -->|blocked| F["Fallback / Escalate / Abort"]
T --> O["Observation"]
O --> C2["Guardrail checks (post-tool)"]
C2 -->|ok| N["Next step"]
C2 -->|blocked| F
Evolution Path
- Built on: Policy + Loop controller + Tracing
- Often paired with:
- HITL (approval when guardrail trips)
- Maker-Checker / CoVe (verification as a reliability guardrail)
Repo Reference
- Code:
src/agent_patterns_lab/runtime/guardrails.py - Example:
examples/66_governance_hitl_policy_guardrails.py - Tests:
tests/test_guardrails.py