Magentic Orchestration: Delegate, Watch Progress, Change Strategy

The Failure It Fixes

A travel assistant gets a broad request: “Plan a ten-day family trip to Japan under $3,000, keep the pace relaxed, and also check visas, transport, hotels, and rainy-day options.” This kind of task is hard to decompose correctly on the first try. You may start with cities, then discover flights break the budget. You may plan Kyoto first, then learn the child mainly wants Universal Studios.

Fixed plans become brittle. Manager-Worker helps with delegation, but what if the delegation itself is wrong? Magentic Orchestration adds a higher-level loop: watch progress, delegate to specialists, record results, and change strategy when the system is stuck.

One-Sentence Version

Replace “plan once, execute forever” with “read the ledger, delegate one narrow task, record the result, check for stalls, then decide again.”

The Naive Version

plan = planner.complete(task)
result = execute_all(plan)

This code trusts the first plan too much. For open-ended tasks, the first plan is often a guess. The real question is whether the runtime can notice “we made no progress” and force a different move.

What Magentic Adds

This pattern is heavier than ordinary multi-agent orchestration. It needs:

orchestrator: decides whether to delegate or finalize.
Specialist: handles narrow tasks such as calculation, search, writing, or checking.
messages: a simplified task ledger in this repo.
stall_limit: repeated identical delegation triggers STALL DETECTED.
RunLimits: a hard cap on total loop steps.

Flow

flowchart TD
  U["User task"] --> L["Task ledger / messages"]
  L --> O["orchestrator chooses next move"]
  O -->|delegate| S["specialist runs narrow task"]
  S --> R["result written back"]
  R --> D{"same delegation repeated?"}
  D -->|no| O
  D -->|yes| X["inject STALL DETECTED"]
  X --> O
  O -->|final| A["Final answer"]

Trace Walkthrough

The example still uses 3+4, but the point is stall detection:

orchestrator returns {"type":"delegate","agent":"calc","task":"Compute 3+4"}.
calc returns 7; Python writes the delegation and result back into messages.
orchestrator returns the exact same delegation again.
Python detects the repeat and injects STALL DETECTED, telling the orchestrator to change strategy or finish.
orchestrator returns {"type":"final","answer":"3+4=7."}.

In a travel assistant, “stuck” might mean repeatedly searching the same city, failing to find a budget-feasible route, or looping on the same preference trade-off. The value is not “more agents.” The value is admitting that the last move did not advance the task.

Code

from __future__ import annotations

from pathlib import Path

from agent_patterns_lab.patterns.magentic_orchestration import Specialist, run_magentic_orchestration
from agent_patterns_lab.runtime import MockLLM, RunLimits, Tracer


def main() -> None:
    tracer = Tracer()

    orchestrator = MockLLM(
        [
            '{"type":"delegate","agent":"calc","task":"Compute 3+4"}',
            '{"type":"delegate","agent":"calc","task":"Compute 3+4"}',
            '{"type":"final","answer":"3+4=7."}',
        ]
    )

    specialists = [
        Specialist(
            name="calc",
            description="Arithmetic specialist.",
            model=MockLLM(["7", "7"]),
        )
    ]

    out = run_magentic_orchestration(
        orchestrator,
        specialists,
        task="Compute 3+4.",
        limits=RunLimits(max_steps=5),
        stall_limit=1,
        tracer=tracer,
    )
    print(out)

    trace_path = tracer.export_jsonl(Path(".traces") / "65_magentic_orchestration.jsonl")
    print(f"[trace] {trace_path}")


if __name__ == "__main__":
    main()

Run it:

UV_CACHE_DIR=.uv_cache PYTHONPATH=src uv run --no-sync python examples/65_magentic_orchestration.py

What to Notice in the Code

orchestrator returns structured actions: delegate or final.
delegate_key = (action.agent, action.task) checks whether the same work was assigned again.
specialist output is written back as a tool message.
when repeated delegation reaches stall_limit, Python appends STALL DETECTED.
run_loop enforces the step limit; the model cannot run forever by itself.

Boundaries to Decide

Ledger contents: task state, decisions, tool results, and open questions.
Ledger writers: only the orchestrator, or specialists too.
Stall definition: repeated delegation, no new artifact, oscillating plans, failing tools.
Stall response: re-split the task, switch tools, switch agents, narrow scope, or ask a human.
Budget: cap steps, tokens, tool calls, and wall-clock time.

Use It When

The task is open-ended and the first plan is likely to be wrong.
Intermediate results should change later steps.
You need an audit trail for why the system continued, reassigned, or stopped.

Avoid It When

The path is fixed. Workflow or Manager-Worker is cheaper and easier to test.
The task is small. A ledger and stall detector can be overkill.
You lack trace, budget, and stop rules. Then this becomes expensive improvisation.

Common Failure Modes

Ledger trash pile: log only information that affects the next decision.
Fake progress: every cycle should produce a checkable artifact, decision, file, or verification.
Crude stall detection: repeated actions are not always wrong; make retry limits explicit.
Permission bypass: specialists must not gain extra access just because they were delegated to.

Nearby Patterns

Manager-Worker: fixed delegation; Magentic changes delegation based on progress.
Group Chat: discussion among agents; Magentic is closer to a scheduler.
ReAct: a single-agent thought/action loop; Magentic is a multi-agent loop with a task ledger.
Planning / Replanning: Magentic puts replanning inside the runtime loop.

References

Azure Architecture Center — Magentic orchestration: https://learn.microsoft.com/en-us/azure/architecture/ai-ml/guide/ai-agent-design-patterns
Microsoft Agent Framework — Magentic orchestration: https://learn.microsoft.com/en-us/agent-framework/user-guide/workflows/orchestrations/magentic
Fourney et al. (2024), Magentic-One: https://arxiv.org/abs/2411.04468