Reflexion: Store Lessons From Failures
If the travel assistant makes a mistake today, such as treating "easy walking" as "add more attractions", it may repeat that mistake tomorrow.
Maker-Checker can repair one draft. Reflexion goes further: after a failure, write a lesson and use that lesson on the next similar task.
One Sentence
Reflexion turns failure-and-retry into failure → lesson → memory → next attempt, so repeated tasks do not start from zero.
What Breaks Without It
| Problem | What it looks like | Risk |
|---|---|---|
| Each task is isolated | Simple | Same mistake repeats |
| Feedback stays in one run | Can fix once | Forgotten next time |
| Lessons are vague | Sounds reflective | Cannot be retrieved or applied |
What This Pattern Changes
| Who | Owns |
|---|---|
| Model | Answers, writes lessons after failure, retries |
| Verifier | Decides pass/fail |
| Memory store | Stores lessons |
| Python | Controls rounds, reads/writes memory, traces |
A lesson should be short, specific, and executable: State the answer as a single number.
Walk Through One Trace
| Round | Answer | Verification | Lesson / next |
|---|---|---|---|
| 1 | bad answer |
Fails; expected 42 |
Lesson: answer only as a number |
| 2 | 42 |
Passes | Return |
For travel, a lesson might be: For easy-walking trips, cap stops at three unless the user asks otherwise.
Flow
flowchart TD
T["Task"] --> M["Read relevant lessons"]
M --> A["Generate answer"]
A --> V["Verify"]
V -->|pass| O["Final answer"]
V -->|fail| L["Write lesson"]
L --> S["Store in memory"]
S --> M
Code Walk
The example uses an in-memory KV store:
kv = InMemoryKV()
The verifier decides pass/fail:
def verify(answer: str) -> VerificationResult:
ok = answer.strip() == "42"
return VerificationResult(ok=ok, feedback="Expected exactly: 42" if not ok else "")
Full example:
from __future__ import annotations
from pathlib import Path
from agent_patterns_lab.patterns.reflexion import VerificationResult, reflexion
from agent_patterns_lab.runtime import InMemoryKV, MockLLM, Tracer
def main() -> None:
tracer = Tracer()
kv = InMemoryKV()
model = MockLLM(
[
"bad answer",
'{"lesson":"State the answer as a single number."}',
"42",
]
)
def verify(answer: str) -> VerificationResult:
ok = answer.strip() == "42"
return VerificationResult(ok=ok, feedback="Expected exactly: 42" if not ok else "")
out = reflexion(
model,
task="What is 6 * 7?",
verify=verify,
memory_get=kv.get,
memory_set=kv.set,
tracer=tracer,
rounds=2,
)
print(out)
trace_path = tracer.export_jsonl(Path(".traces") / "42_reflexion.jsonl")
print(f"[trace] {trace_path}")
if __name__ == "__main__":
main()
Run:
UV_CACHE_DIR=.uv_cache PYTHONPATH=src uv run --no-sync python examples/42_reflexion.py
Nearby Patterns
| Pattern | Who decides next | Use when |
|---|---|---|
| Maker-Checker | Current draft is revised | One run needs feedback |
| Reflexion | Failed lesson is stored | Similar tasks repeat |
| Memory | Preferences or facts persist | Long-term context matters |
| CoVe | Claims are verified | Factual correctness matters |
When To Use It
- Similar tasks repeat.
- There is a verifier or reliable feedback.
- Lessons can be written as short checks.
- You can maintain memory quality.
When Not To Use It
- The task is one-off.
- There is no reliable verification signal.
- Lessons may mislead future tasks.
- High-risk memory has no review process.
Costs And Common Failures
| Failure | Symptom | Fix |
|---|---|---|
| Vague lesson | "Be careful" | Write concrete checks |
| Memory pollution | Bad lesson repeats | Add review, expiry, namespaces |
| Retrieval miss | Lesson exists but is not used | Add tags/task keys |
| Overfitting to memory | Old lesson distorts new task | Keep verifier in the loop |
What To Read Next
Reflexion fits systems that learn from verifiable failures.
For one-run revision, read Maker-Checker. For preferences inside the current session, read Conversation History.