Skip to main content

Command Palette

Search for a command to run...

Part 4: Multi-Agent Orchestration with a Supervisor Pattern

Updated
Part 4: Multi-Agent Orchestration with a Supervisor Pattern

If you have been following along, you might be wondering, "Why are we making this so complicated? Can't we just give one giant AI all the tools and let it figure it out?"

You could. And for simple tasks, that works fine. But the moment your project gets even slightly complex, that single-agent approach falls apart spectacularly. It's like hiring one person to be your company's CEO, accountant, marketer, and janitor all at once. They might be a genius, but they are going to get confused, mix up their tasks, and probably end up trying to file a tax return using a broom.

We need a team of specialists. Here is why splitting our AI into multiple agents is a game-changer:

  1. Prompt Specialization: You can give each agent a highly-tuned, specific prompt. The Researcher agent's prompt can be filled with instructions about how to verify sources and synthesize information. The Hardware Controller's prompt can be a terse, no-nonsense directive: "You are a machine interface. You only execute hardware commands. Do not chat." This makes each agent an expert in its domain.

  2. Tool Scoping & Safety: The single biggest reason for this architecture. You only give an agent the tools it absolutely needs to do its job. Our Hardware Controller will only have access to the Arduino tools. It literally won't even know the file system or the web search tool exists. This drastically reduces the context window (saving money and time) and, more importantly, prevents the AI from making catastrophic mistakes, like hallucinating a reason to delete a file when it was just supposed to blink an LED.

  3. Model Specialization & Cost Savings: Not all tasks require a top-of-the-line, expensive LLM like GPT-4 or Claude Opus. Does the agent just need to decide whether a task involves "research" or "hardware"? You can use a tiny, fast, and cheap model like Claude Haiku for that. But for the agent that needs to write complex code? You can swap in the big guns for just that one task. This "asymmetric" approach saves a ton of money and improves performance.

The Supervisor Pattern

So how do we manage this team of AI specialists? We hire a manager. In agentic design, we call this the Supervisor Pattern.

This is a hierarchical architecture where we have a single Supervisor agent who acts as the project manager or router. Its one and only job is to look at the user's request and the current state of the project and decide who speaks next. It doesn't do the work itself; it delegates.

The supervisor creates a more reliable and modular system. Here is how it works:

  • The User: "I want to know the weather and then turn on a blue light if it's raining."

  • The Supervisor: "Okay, first I need weather data. I will route this task to the Researcher Agent."

  • The Researcher: (uses its weather tool) "The weather is 'Rainy'." (It hands the result back to the supervisor).

  • The Supervisor: "Okay, the condition 'rainy' is met. The next step is to turn on a blue light. I will route this task to the Hardware Controller Agent."

  • The Hardware Controller: (uses its LED tool with the argument 'blue') "The blue light is on." (It hands the result back).

  • The Supervisor: "All steps are complete. I will now inform the user."

In the above examples, we can see that the supervisor agent doesn’t directly perform the actions, but instead instructs other agents to perform them on its behalf.

Creating the Specialist Sub-Agents

For our project, we will build a small but effective team of two specialist agents:

  1. The Researcher: Its sole purpose is to browse the internet and find information. It is only equipped with a single tool: a Web Search MCP. Its system prompt will be something like, "You are an expert web researcher. Your goal is to find accurate, up-to-date information to answer the user's query."

  2. The Hardware Controller: It is the only one that can touch the physical world. It is equipped with our custom Arduino tools (start_led_blinker, play_sound, etc.). This allows you to perform actions in the real world using AI.

Each of these agents is its own self-contained unit, making our system incredibly modular. If we want to add the ability to send emails, we don't have to retrain our entire system; we simply add an "Email Agent" and instruct the Supervisor when to use it.

How the Supervisor Decides Which Agent to Call

​​The routing in DeepAgents works through a tool-based delegation system where the main agent (supervisor) uses a task tool to spawn specialized subagents based on their descriptions and capabilities.

The main agent doesn't use a traditional routing algorithm. Instead, it makes routing decisions based on:

  1. Subagent Descriptions: Each subagent provides a description that explains when to use it. These descriptions are formatted into the task tool's description and presented to the main agent.

  2. Tool Selection: The main agent uses the task tool with a subagent_type parameter to explicitly choose which subagent to invoke.

  3. Contextual Decision Making: The LLM analyzes the task requirements and matches them against the available subagent descriptions to make the routing decision.

The Routing Flow

When SubAgentMiddleware is initialized, it creates a registry of subagents and formats their descriptions. The task tool is created with these descriptions embedded:

When the main agent needs to delegate work:

  1. It receives the task tool description with all available subagents

  2. The LLM analyzes the task and selects the appropriate subagent_type

  3. It calls task(description="...", subagent_type="selected-agent")

  4. The middleware validates the subagent exists and invokes it with filtered state

Routing System Design Patterns

The routing system uses several important patterns:

  • Declarative Specification: Subagents are defined with clear descriptions that guide routing decisions

  • State Isolation: Each subagent runs with filtered state, excluding messages and todos from the parent

  • Parallel Execution: The system supports spawning multiple subagents in parallel for independent tasks

The supervisor's routing intelligence comes from the LLM's ability to understand task requirements and match them against the provided subagent descriptions, rather than from a hardcoded routing table or algorithm.

Handling Handoffs

The handoff mechanism in DeepAgents is implemented through a state-based delegation system where the supervisor agent maintains control while subagents execute isolated tasks and return structured results. When the supervisor delegates work using the task tool, it passes a filtered state to the subagent that excludes conversation history and todos, ensuring clean context isolation. The subagent executes its task autonomously and returns its findings through a Command object that updates the shared state with only the essential results - specifically filtering out intermediate messages and todos to maintain context cleanliness.

This creates a continuous loop where the supervisor maintains awareness of the overall goal while delegating specialized work. After each subagent completes its task, the supervisor receives the updated state containing the new findings (such as research results or analysis outputs) and evaluates what work remains to be done.

The supervisor then decides whether to delegate additional tasks to other subagents, perform work directly, or conclude that the user's request has been fully satisfied. This handoff pattern enables complex multi-step workflows while keeping the main agent's context focused on orchestration rather than execution details.

The technical implementation ensures that each handoff preserves only the necessary state information. When a subagent returns results, the _return_command_with_state_update function creates a Command that merges the subagent's custom state keys with a ToolMessage containing just the final output text.

This design allows the supervisor to maintain a clean conversation history while still accessing the detailed outputs generated by specialized subagents through the shared state object, enabling it to make informed decisions about subsequent steps in the workflow.

Demo: Executing a Complex Multi-Agent Query

Let's put it all together. Imagine we give our system the following complex, multi-step command:

"Research the release date of GTA 6, write it to a file, and then display the year on the Arduino."

A single, monolithic agent would likely get confused here. But our agent with hand over this task to a specialised agent that has all the necessary tools and information to start researching the topic. Once the necessary information is retrieved, the supervisor agent with then hand over control to a secondary specialized agent, which can communicate with the Arduino hardware and display the information.

This clean, step-by-step execution is only possible because we have the right agent for the right job, all orchestrated by a smart supervisor.

In the next part, we will address the issue of amnesia. How do we give our agents memory so they can remember past conversations and tasks?

More from this blog

B

Bit Byte Blog - Code, Sass, and Everything in Between

14 posts

Explore the exciting world of Artificial Intelligence, Cloud Technologies, and Software Development on my blog. I write simple, engaging articles that break down tech trends into everyday language.