跳转至

Maker-Checker: Draft, Check, Revise

The travel assistant can write an itinerary, but "sounds like a guide" does not mean it is good. It may be vague, omit packing advice, or ignore "easy walking".

The smallest useful fix is not a complicated multi-agent system. Split the job into two roles: one makes, one checks.

One Sentence

Maker-Checker turns one-shot generation into draft → check → revise, so quality criteria become an executable feedback loop.

What Breaks Without It

Problem What it looks like Risk
Direct final answer Fast Vague, incomplete, uncontrolled
Same prompt self-checks Simple It may justify its own draft
No pass/fail standard Sounds reviewed You cannot tell whether it passed

What This Pattern Changes

Who Owns
Maker Drafts an answer
Checker Applies a rubric and returns pass/fail feedback
Python Controls rounds, sends feedback back, stops at a limit

Checker output should be structured:

{"passed": false, "feedback": "Too vague. Add concrete details."}

Walk Through One Trace

Round Maker draft Checker feedback Next
1 draft v1 (bad) Too vague; add details Revise
2 draft v2 (better) Pass Return

Flow

flowchart TD
  T["Task"] --> M["Maker drafts"]
  M --> C["Checker reviews"]
  C -->|fail + feedback| M
  C -->|pass| O["Final answer"]
  C -->|round limit| X["Stop with best available result"]

Code Walk

The maker produces two versions:

maker = MockLLM(["draft v1 (bad)", "draft v2 (better)"])

The checker rejects the first and accepts the second:

checker = MockLLM(
    [
        '{"passed": false, "feedback": "Too vague. Add concrete details."}',
        '{"passed": true, "feedback": ""}',
    ]
)

Full example:

from __future__ import annotations

from pathlib import Path

from agent_patterns_lab.patterns.maker_checker import maker_checker
from agent_patterns_lab.runtime import MockLLM, Tracer


def main() -> None:
    tracer = Tracer()

    maker = MockLLM(["draft v1 (bad)", "draft v2 (better)"])
    checker = MockLLM(
        [
            '{"passed": false, "feedback": "Too vague. Add concrete details."}',
            '{"passed": true, "feedback": ""}',
        ]
    )

    out = maker_checker(maker, checker, task="Write a short project summary.", max_rounds=2, tracer=tracer)
    print(out)

    trace_path = tracer.export_jsonl(Path(".traces") / "30_maker_checker.jsonl")
    print(f"[trace] {trace_path}")


if __name__ == "__main__":
    main()

Run:

UV_CACHE_DIR=.uv_cache PYTHONPATH=src uv run --no-sync python examples/30_maker_checker.py

Nearby Patterns

Pattern Who decides next Use when
Maker-Checker Checker passes or fails You have quality criteria
Voting Majority or judge selects Answers are short and normalizable
CoVe Claims are extracted and verified Factual claims matter
Reflexion Lessons are stored Similar failures repeat

When To Use It

  • You can write a rubric.
  • Draft quality is unstable but feedback improves it.
  • Risk is moderate.
  • You want traceable review feedback.

When Not To Use It

  • There is no executable quality standard.
  • Maker and checker share the same blind spot.
  • One answer is enough.
  • Fact verification matters more; use CoVe or retrieval.

Costs And Common Failures

Failure Symptom Fix
Vague checker "Looks good" Use rubric and structured JSON
Endless revision Never passes Add max_rounds
Unusable feedback "Make it better" Require concrete missing items
Shared hallucination Both believe a false fact Add tool verification or CoVe

Maker-Checker makes unstable quality visible and repairable.

For factual claims, read CoVe. For sampling variance, read Voting.

References