Retrieval Loop: Search Until Evidence Is Enough

If the travel assistant searches once, a bad query can sink the answer. The user asks for a "Hangzhou tea culture route", but the query is just "Hangzhou attractions", so retrieval returns generic places.

Retrieval Loop is not a full agent. It adds a small loop to RAG: propose a query, retrieve, decide whether evidence is enough, and either answer or search again.

One Sentence

Retrieval Loop turns one-shot retrieve-then-answer into query → retrieve → decide → answer/retry, so simple RAG can repair weak retrieval.

What Breaks Without It

Problem	What it looks like	Risk
One retrieval only	Fast	Bad query means bad answer
No evidence sufficiency check	Has context	Model may answer from weak evidence
Hits are not recorded	Has citations	Hard to replay why those docs were used

What This Pattern Changes

Who	Owns
Model	Proposes query and decides done/not done
Retriever	Returns snippets and doc ids
Python	Controls rounds, stores hits, writes trace

It is narrower than Agentic RAG: the only loop action is retrieval.

Walk Through One Trace

Round	Model action	Retrieval result	Next
1	query: `Paris capital`	Hit document `paris`	Evidence is enough
2	done + answer	`Paris is the capital of France. [paris]`	Stop

The travel version is similar: search "Hangzhou tea culture easy walking"; if results are generic, search a more specific rainy-day tea route query.

Flow

flowchart TD
  Q["Question"] --> M["Model proposes query"]
  M --> R["Retrieve"]
  R --> H["Store hits"]
  H --> D{"Enough evidence?"}
  D -->|No| M
  D -->|Yes| A["Answer + cite doc ids"]

Code Walk

The example uses a tiny corpus:

docs = load_docs(Path("data/mini_corpus.jsonl"))
index = SimpleSearchIndex(docs)

The model proposes a query, then marks the run done:

model = MockLLM(
    [
        '{"query":"Paris capital"}',
        '{"done": true, "answer": "Paris is the capital of France. [paris]"}',
    ]
)

Full example:

from __future__ import annotations

import json
from pathlib import Path

from agent_patterns_lab.patterns.retrieval_loop import retrieval_loop
from agent_patterns_lab.runtime import Document, MockLLM, SimpleSearchIndex, Tracer


def load_docs(path: Path) -> list[Document]:
    docs: list[Document] = []
    for line in path.read_text(encoding="utf-8").splitlines():
        if not line.strip():
            continue
        obj = json.loads(line)
        docs.append(Document(doc_id=obj["doc_id"], text=obj["text"]))
    return docs


def main() -> None:
    tracer = Tracer()
    docs = load_docs(Path("data/mini_corpus.jsonl"))
    index = SimpleSearchIndex(docs)

    # 1) propose query  2) decide done+answer
    model = MockLLM(
        [
            '{"query":"Paris capital"}',
            '{"done": true, "answer": "Paris is the capital of France. [paris]"}',
        ]
    )

    result = retrieval_loop(model, question="What is the capital of France?", index=index, rounds=2, tracer=tracer)
    print(result.answer)

    trace_path = tracer.export_jsonl(Path(".traces") / "40_retrieval_loop.jsonl")
    print(f"[trace] {trace_path}")


if __name__ == "__main__":
    main()

Run:

UV_CACHE_DIR=.uv_cache PYTHONPATH=src uv run --no-sync python examples/40_retrieval_loop.py

Nearby Patterns

Pattern	Who decides next	Use when
One-shot RAG	Code retrieves once	Query is obvious
Retrieval Loop	Model decides whether to search again	Query may need repair
Agentic RAG	Model can choose retrieval as an action	Retrieval is one tool in an agent loop
STORM	Sections retrieve separately	You are writing an article/report

When To Use It

One retrieval often misses.
The query needs rewriting.
You want to trace each retrieval round.
A full agent loop would be too much.

When Not To Use It

One retrieval is enough.
Retrieval quality is poor; more search just adds noise.
You need multiple tool types; use Agentic RAG or ReAct.
There is no round limit.

Costs And Common Failures

Failure	Symptom	Fix
Query churn	Many rounds, no new evidence	Add rounds and stagnation checks
Fake citation	Cites unrelated doc	Force answer from hits
Dirty context	All results are appended	Keep top-k and summaries
Query too broad	Generic docs only	Ask for specific query fields

What To Read Next

Retrieval Loop is the small loop inside RAG.

If retrieval is one action among many, read Agentic RAG. For long reports, read STORM.