Voting: Let Multiple Answers Calibrate Each Other

The same model can answer the same question differently. A travel assistant may put the tea museum in the afternoon once and Longjing Village the next time.

If the answer is short and easy to normalize, a cheap stabilizer is to sample several times and vote.

One Sentence

Voting turns one sample into multiple samples plus normalization plus winner selection, trading cost for lower variance.

What Breaks Without It

Problem	What it looks like	Risk
One sample carries randomness	Sometimes great	Same task becomes unstable
First candidate wins	Fast	Majority answer may be missed
Long prose is voted directly	Sounds democratic	There is no comparable answer key

What This Pattern Changes

Who	Owns
Model	Generates multiple candidates
Normalizer	Converts candidates into comparable keys
Voter / judge	Chooses the winner
Python	Controls sample count and traces candidates

Walk Through One Trace

The example task is small: Choose A or B.

Sample	Raw output	Normalized
1	`A`	`A`
2	`B`	`B`
3	`A`	`A`
4	`A`	`A`
5	`B`	`B`

A wins because it appears three times.

Flow

flowchart TD
  P["Same prompt"] --> S1["Candidate 1"]
  P --> S2["Candidate 2"]
  P --> S3["Candidate 3"]
  S1 --> N["normalize"]
  S2 --> N
  S3 --> N
  N --> V["vote / judge"]
  V --> O["winner"]

Code Walk

model = MockLLM(["A", "B", "A", "A", "B"])
messages = [Message(role="user", content="Choose A or B.")]
out = self_consistency(model, messages, n=5, normalize=lambda s: s.strip(), tracer=tracer)

Full example:

from __future__ import annotations

from pathlib import Path

from agent_patterns_lab.patterns.voting import self_consistency
from agent_patterns_lab.runtime import Message, MockLLM, Tracer


def main() -> None:
    tracer = Tracer()
    model = MockLLM(["A", "B", "A", "A", "B"])

    messages = [Message(role="user", content="Choose A or B.")]
    out = self_consistency(model, messages, n=5, normalize=lambda s: s.strip(), tracer=tracer)
    print(out)

    trace_path = tracer.export_jsonl(Path(".traces") / "31_voting.jsonl")
    print(f"[trace] {trace_path}")


if __name__ == "__main__":
    main()

Run:

UV_CACHE_DIR=.uv_cache PYTHONPATH=src uv run --no-sync python examples/31_voting.py

Nearby Patterns

Pattern	Who decides next	Use when
Voting	Candidates vote	Answers are short and normalizable
Maker-Checker	Checker asks for revision	Drafts need rubric feedback
CoVe	Claims are verified	Correctness depends on evidence
LATS	Search expands and scores candidates	Candidates need multi-step exploration

When To Use It

The answer can be normalized into a short key.
Extra samples are affordable.
You need lower randomness, not external facts.
Failures are occasional variance, not systematic ignorance.

When Not To Use It

Facts need retrieval or tools.
Long-form answers cannot be compared.
The model is systematically biased.
Latency and cost are tight.

Costs And Common Failures

Failure	Symptom	Fix
No majority	`A/B/C` are tied	Add a judge or fallback to Maker-Checker
Cannot normalize	Every long answer differs	Use structured output to extract a key
Systematic bias	Majority is still wrong	Add retrieval, tools, or CoVe
High cost	`n=5` means five calls	Route only hard samples to voting

What To Read Next

Voting handles randomness, not missing facts.

For factual checks, read CoVe. For revision from feedback, read Maker-Checker.