Maker-Checker: Draft, Check, Revise

The travel assistant can write an itinerary, but "sounds like a guide" does not mean it is good. It may be vague, omit packing advice, or ignore "easy walking".

The smallest useful fix is not a complicated multi-agent system. Split the job into two roles: one makes, one checks.

One Sentence

Maker-Checker turns one-shot generation into draft → check → revise, so quality criteria become an executable feedback loop.

What Breaks Without It

Problem	What it looks like	Risk
Direct final answer	Fast	Vague, incomplete, uncontrolled
Same prompt self-checks	Simple	It may justify its own draft
No pass/fail standard	Sounds reviewed	You cannot tell whether it passed

What This Pattern Changes

Who	Owns
Maker	Drafts an answer
Checker	Applies a rubric and returns pass/fail feedback
Python	Controls rounds, sends feedback back, stops at a limit

Checker output should be structured:

{"passed": false, "feedback": "Too vague. Add concrete details."}

Walk Through One Trace

Round	Maker draft	Checker feedback	Next
1	`draft v1 (bad)`	Too vague; add details	Revise
2	`draft v2 (better)`	Pass	Return

Flow

flowchart TD
  T["Task"] --> M["Maker drafts"]
  M --> C["Checker reviews"]
  C -->|fail + feedback| M
  C -->|pass| O["Final answer"]
  C -->|round limit| X["Stop with best available result"]

Code Walk

The maker produces two versions:

maker = MockLLM(["draft v1 (bad)", "draft v2 (better)"])

The checker rejects the first and accepts the second:

checker = MockLLM(
    [
        '{"passed": false, "feedback": "Too vague. Add concrete details."}',
        '{"passed": true, "feedback": ""}',
    ]
)

Full example:

from __future__ import annotations

from pathlib import Path

from agent_patterns_lab.patterns.maker_checker import maker_checker
from agent_patterns_lab.runtime import MockLLM, Tracer


def main() -> None:
    tracer = Tracer()

    maker = MockLLM(["draft v1 (bad)", "draft v2 (better)"])
    checker = MockLLM(
        [
            '{"passed": false, "feedback": "Too vague. Add concrete details."}',
            '{"passed": true, "feedback": ""}',
        ]
    )

    out = maker_checker(maker, checker, task="Write a short project summary.", max_rounds=2, tracer=tracer)
    print(out)

    trace_path = tracer.export_jsonl(Path(".traces") / "30_maker_checker.jsonl")
    print(f"[trace] {trace_path}")


if __name__ == "__main__":
    main()

Run:

UV_CACHE_DIR=.uv_cache PYTHONPATH=src uv run --no-sync python examples/30_maker_checker.py

Nearby Patterns

Pattern	Who decides next	Use when
Maker-Checker	Checker passes or fails	You have quality criteria
Voting	Majority or judge selects	Answers are short and normalizable
CoVe	Claims are extracted and verified	Factual claims matter
Reflexion	Lessons are stored	Similar failures repeat

When To Use It

You can write a rubric.
Draft quality is unstable but feedback improves it.
Risk is moderate.
You want traceable review feedback.

When Not To Use It

There is no executable quality standard.
Maker and checker share the same blind spot.
One answer is enough.
Fact verification matters more; use CoVe or retrieval.

Costs And Common Failures

Failure	Symptom	Fix
Vague checker	"Looks good"	Use rubric and structured JSON
Endless revision	Never passes	Add `max_rounds`
Unusable feedback	"Make it better"	Require concrete missing items
Shared hallucination	Both believe a false fact	Add tool verification or CoVe

What To Read Next

Maker-Checker makes unstable quality visible and repairable.

For factual claims, read CoVe. For sampling variance, read Voting.