跳转至

Group Chat: Make Agents Challenge Each Other

The Failure It Fixes

A travel assistant can sound confident while missing the obvious. Ask for a three-day Tokyo trip with an older parent, and a single agent may pack each day with five attractions. The plan looks complete, but nobody asked: can they walk that much, where do they rest, what happens if it rains?

Group Chat is not “more agents for vibes.” It exists because one speaker often cannot see its own blind spots. The pattern puts several roles into the same conversation: one proposes, one challenges, one turns the result into the user-facing answer.

One-Sentence Version

Turn a single-agent monologue into a governed discussion. Python chooses speakers, records history, caps turns, and stops the meeting; the model only decides what to say on its turn or whether it can finalize.

The Naive Version

answer = llm.complete([
    {"role": "user", "content": "Plan a three-day Tokyo trip for an older parent."}
])

There is no dissenter here. If the model suggests Asakusa, Ginza, Shibuya, and a late-night food crawl, the code just accepts it unless you add another review pass.

What Group Chat Adds

The change is small: add a conversation policy.

  • ChatAgent: each speaker has a name, role, and model.
  • history: everyone sees the same shared transcript.
  • schedule: either fixed order or selector-chosen turns.
  • max_rounds / max_turns: the meeting cannot run forever.
  • Final: a speaker can explicitly end the discussion.

Flow

Round-robin feels like a small meeting:

flowchart TD
  U["User task"] --> H["Shared history"]
  H --> P["planner proposes"]
  P --> H
  H --> C["critic checks risks"]
  C -->|continue| H
  C -->|final| O["Final answer"]

Selector-based chat adds one decision before each turn:

flowchart TD
  U["User task"] --> H["Shared history"]
  H --> S["selector chooses speaker"]
  S --> R["researcher adds facts"]
  S --> W["writer answers"]
  R --> H
  W -->|final| O["Final answer"]

Trace Walkthrough

The examples use 2+2 on purpose. Tiny tasks make the control flow visible.

  1. planner returns {"type":"speak","content":"We can compute 2+2 = 4."}.
  2. Python appends that message to history.
  3. critic sees the same history and returns {"type":"final","answer":"2+2=4."}.
  4. Python sees final, stops the meeting, and returns the answer.

The selector example runs differently:

  1. selector chooses researcher.
  2. researcher says Fact: 2+2 equals 4.
  3. selector chooses writer.
  4. writer returns the final answer.

Code: Round-Robin

This is the full runnable example. Notice that the agents are thin. The loop lives in run_group_chat_round_robin.

from __future__ import annotations

from pathlib import Path

from agent_patterns_lab.patterns.group_chat import ChatAgent, run_group_chat_round_robin
from agent_patterns_lab.runtime import MockLLM, Tracer


def main() -> None:
    tracer = Tracer()

    planner = ChatAgent(
        name="planner",
        description="Breaks down the task and proposes a solution.",
        model=MockLLM(['{"type":"speak","content":"We can compute 2+2 = 4."}']),
    )
    critic = ChatAgent(
        name="critic",
        description="Checks the solution and finalizes.",
        model=MockLLM(['{"type":"final","answer":"2+2=4."}']),
    )

    out = run_group_chat_round_robin([planner, critic], task="Compute 2+2.", max_rounds=2, tracer=tracer)
    print(out)

    trace_path = tracer.export_jsonl(Path(".traces") / "62_group_chat_round_robin.jsonl")
    print(f"[trace] {trace_path}")


if __name__ == "__main__":
    main()

Run it:

UV_CACHE_DIR=.uv_cache PYTHONPATH=src uv run --no-sync python examples/62_group_chat_round_robin.py

The output includes the answer and a trace file. The trace shows who spoke and whether the turn was speak or final.

Code: Selector

Selector scheduling is useful when the next role depends on missing information. In a travel assistant, you may need weather first, budget first, or accessibility first depending on the user request.

from __future__ import annotations

from pathlib import Path

from agent_patterns_lab.patterns.group_chat import ChatAgent, run_group_chat_selector
from agent_patterns_lab.runtime import MockLLM, Tracer


def main() -> None:
    tracer = Tracer()

    selector = MockLLM(['{"speaker":"researcher"}', '{"speaker":"writer"}'])

    researcher = ChatAgent(
        name="researcher",
        description="Provides key facts and evidence.",
        model=MockLLM(['{"type":"speak","content":"Fact: 2+2 equals 4."}']),
    )
    writer = ChatAgent(
        name="writer",
        description="Produces the final user-facing answer.",
        model=MockLLM(['{"type":"final","answer":"2+2=4."}']),
    )

    out = run_group_chat_selector(selector, [researcher, writer], task="Compute 2+2.", max_turns=4, tracer=tracer)
    print(out)

    trace_path = tracer.export_jsonl(Path(".traces") / "63_group_chat_selector.jsonl")
    print(f"[trace] {trace_path}")


if __name__ == "__main__":
    main()

Run it:

UV_CACHE_DIR=.uv_cache PYTHONPATH=src uv run --no-sync python examples/63_group_chat_selector.py

Boundaries to Decide

  • Speaker choice: fixed list, selector model, or moderator.
  • Final authority: any agent can finalize, or only writer/moderator can finalize.
  • Context sharing: full shared history, or compressed meeting notes.
  • Budget: cap turns, tokens, and wall-clock time.
  • Answer ownership: last speaker, moderator, or dedicated writer.

Use It When

  • The task has hidden assumptions: travel planning, code review, product decisions, risk review.
  • You want disagreement to be part of the runtime, not something a human remembers to ask for.
  • The roles bring genuinely different information or judgment criteria.

Avoid It When

  • You only need several independent answers. Use Voting or fan-out/fan-in instead.
  • The process is fixed. Workflow Chaining is easier to test.
  • You do not have a stopping rule. Group chats are excellent at spending tokens.

Common Failure Modes

  • Polite agreement: make the critic’s job explicit; it must identify risks.
  • Wandering discussion: have a moderator compress the debate into short notes.
  • Cost blow-up: reserve Group Chat for high-risk tasks.
  • No owner: appoint a writer or moderator to produce the final answer.

Nearby Patterns

  • Voting: agents answer independently, then a judge picks; Group Chat lets agents influence each other.
  • Manager-Worker: a manager assigns work; Group Chat is closer to a discussion.
  • Handoff: ownership moves to another agent; Group Chat usually stays in one shared meeting.
  • Magentic Orchestration: heavier orchestration with a task ledger and stall detection.

References

  • AutoGen — Group Chat pattern overview: https://microsoft.github.io/autogen/0.4.5/user-guide/core-user-guide/design-patterns/group-chat.html
  • Microsoft Agent Framework — Group Chat orchestration: https://learn.microsoft.com/en-us/agent-framework/user-guide/workflows/orchestrations/group-chat
  • Azure Architecture Center — AI agent orchestration patterns: https://learn.microsoft.com/en-us/azure/architecture/ai-ml/guide/ai-agent-design-patterns