Reflexion: Store Lessons From Failures

If the travel assistant makes a mistake today, such as treating "easy walking" as "add more attractions", it may repeat that mistake tomorrow.

Maker-Checker can repair one draft. Reflexion goes further: after a failure, write a lesson and use that lesson on the next similar task.

One Sentence

Reflexion turns failure-and-retry into failure → lesson → memory → next attempt, so repeated tasks do not start from zero.

What Breaks Without It

Problem	What it looks like	Risk
Each task is isolated	Simple	Same mistake repeats
Feedback stays in one run	Can fix once	Forgotten next time
Lessons are vague	Sounds reflective	Cannot be retrieved or applied

What This Pattern Changes

Who	Owns
Model	Answers, writes lessons after failure, retries
Verifier	Decides pass/fail
Memory store	Stores lessons
Python	Controls rounds, reads/writes memory, traces

A lesson should be short, specific, and executable: State the answer as a single number.

Walk Through One Trace

Round	Answer	Verification	Lesson / next
1	`bad answer`	Fails; expected `42`	Lesson: answer only as a number
2	`42`	Passes	Return

For travel, a lesson might be: For easy-walking trips, cap stops at three unless the user asks otherwise.

Flow

flowchart TD
  T["Task"] --> M["Read relevant lessons"]
  M --> A["Generate answer"]
  A --> V["Verify"]
  V -->|pass| O["Final answer"]
  V -->|fail| L["Write lesson"]
  L --> S["Store in memory"]
  S --> M

Code Walk

The example uses an in-memory KV store:

kv = InMemoryKV()

The verifier decides pass/fail:

def verify(answer: str) -> VerificationResult:
    ok = answer.strip() == "42"
    return VerificationResult(ok=ok, feedback="Expected exactly: 42" if not ok else "")

Full example:

from __future__ import annotations

from pathlib import Path

from agent_patterns_lab.patterns.reflexion import VerificationResult, reflexion
from agent_patterns_lab.runtime import InMemoryKV, MockLLM, Tracer


def main() -> None:
    tracer = Tracer()
    kv = InMemoryKV()

    model = MockLLM(
        [
            "bad answer",
            '{"lesson":"State the answer as a single number."}',
            "42",
        ]
    )

    def verify(answer: str) -> VerificationResult:
        ok = answer.strip() == "42"
        return VerificationResult(ok=ok, feedback="Expected exactly: 42" if not ok else "")

    out = reflexion(
        model,
        task="What is 6 * 7?",
        verify=verify,
        memory_get=kv.get,
        memory_set=kv.set,
        tracer=tracer,
        rounds=2,
    )

    print(out)
    trace_path = tracer.export_jsonl(Path(".traces") / "42_reflexion.jsonl")
    print(f"[trace] {trace_path}")


if __name__ == "__main__":
    main()

Run:

UV_CACHE_DIR=.uv_cache PYTHONPATH=src uv run --no-sync python examples/42_reflexion.py

Nearby Patterns

Pattern	Who decides next	Use when
Maker-Checker	Current draft is revised	One run needs feedback
Reflexion	Failed lesson is stored	Similar tasks repeat
Memory	Preferences or facts persist	Long-term context matters
CoVe	Claims are verified	Factual correctness matters

When To Use It

Similar tasks repeat.
There is a verifier or reliable feedback.
Lessons can be written as short checks.
You can maintain memory quality.

When Not To Use It

The task is one-off.
There is no reliable verification signal.
Lessons may mislead future tasks.
High-risk memory has no review process.

Costs And Common Failures

Failure	Symptom	Fix
Vague lesson	"Be careful"	Write concrete checks
Memory pollution	Bad lesson repeats	Add review, expiry, namespaces
Retrieval miss	Lesson exists but is not used	Add tags/task keys
Overfitting to memory	Old lesson distorts new task	Keep verifier in the loop

What To Read Next

Reflexion fits systems that learn from verifiable failures.

For one-run revision, read Maker-Checker. For preferences inside the current session, read Conversation History.