Voting: Let Multiple Answers Calibrate Each Other
The same model can answer the same question differently. A travel assistant may put the tea museum in the afternoon once and Longjing Village the next time.
If the answer is short and easy to normalize, a cheap stabilizer is to sample several times and vote.
One Sentence
Voting turns one sample into multiple samples plus normalization plus winner selection, trading cost for lower variance.
What Breaks Without It
| Problem | What it looks like | Risk |
|---|---|---|
| One sample carries randomness | Sometimes great | Same task becomes unstable |
| First candidate wins | Fast | Majority answer may be missed |
| Long prose is voted directly | Sounds democratic | There is no comparable answer key |
What This Pattern Changes
| Who | Owns |
|---|---|
| Model | Generates multiple candidates |
| Normalizer | Converts candidates into comparable keys |
| Voter / judge | Chooses the winner |
| Python | Controls sample count and traces candidates |
Walk Through One Trace
The example task is small: Choose A or B.
| Sample | Raw output | Normalized |
|---|---|---|
| 1 | A |
A |
| 2 | B |
B |
| 3 | A |
A |
| 4 | A |
A |
| 5 | B |
B |
A wins because it appears three times.
Flow
flowchart TD
P["Same prompt"] --> S1["Candidate 1"]
P --> S2["Candidate 2"]
P --> S3["Candidate 3"]
S1 --> N["normalize"]
S2 --> N
S3 --> N
N --> V["vote / judge"]
V --> O["winner"]
Code Walk
model = MockLLM(["A", "B", "A", "A", "B"])
messages = [Message(role="user", content="Choose A or B.")]
out = self_consistency(model, messages, n=5, normalize=lambda s: s.strip(), tracer=tracer)
Full example:
from __future__ import annotations
from pathlib import Path
from agent_patterns_lab.patterns.voting import self_consistency
from agent_patterns_lab.runtime import Message, MockLLM, Tracer
def main() -> None:
tracer = Tracer()
model = MockLLM(["A", "B", "A", "A", "B"])
messages = [Message(role="user", content="Choose A or B.")]
out = self_consistency(model, messages, n=5, normalize=lambda s: s.strip(), tracer=tracer)
print(out)
trace_path = tracer.export_jsonl(Path(".traces") / "31_voting.jsonl")
print(f"[trace] {trace_path}")
if __name__ == "__main__":
main()
Run:
UV_CACHE_DIR=.uv_cache PYTHONPATH=src uv run --no-sync python examples/31_voting.py
Nearby Patterns
| Pattern | Who decides next | Use when |
|---|---|---|
| Voting | Candidates vote | Answers are short and normalizable |
| Maker-Checker | Checker asks for revision | Drafts need rubric feedback |
| CoVe | Claims are verified | Correctness depends on evidence |
| LATS | Search expands and scores candidates | Candidates need multi-step exploration |
When To Use It
- The answer can be normalized into a short key.
- Extra samples are affordable.
- You need lower randomness, not external facts.
- Failures are occasional variance, not systematic ignorance.
When Not To Use It
- Facts need retrieval or tools.
- Long-form answers cannot be compared.
- The model is systematically biased.
- Latency and cost are tight.
Costs And Common Failures
| Failure | Symptom | Fix |
|---|---|---|
| No majority | A/B/C are tied |
Add a judge or fallback to Maker-Checker |
| Cannot normalize | Every long answer differs | Use structured output to extract a key |
| Systematic bias | Majority is still wrong | Add retrieval, tools, or CoVe |
| High cost | n=5 means five calls |
Route only hard samples to voting |
What To Read Next
Voting handles randomness, not missing facts.
For factual checks, read CoVe. For revision from feedback, read Maker-Checker.