Streaming deepagents and task delegation with real-time output

Using an LLM to call tools in a loop is the simplest form of an agent. This architecture, however, can yield agents that are "shallow" and fail to plan and act over longer, more complex tasks.

Applications like "Deep Research", "Manus" and "Claude Code" have gotten around this limitation by implementing a combination of four things: a planning tool, sub agents, access to a file system, and a detailed prompt.

deepagents is a Python package that implements these in a general purpose way so that you can easily create a Deep Agent for your application. Heavily inspired by Claude Code.

DeepAgents provides a powerful framework for building AI agents that can plan, delegate to sub-agents and maintain context across complex tasks. One enhancement that can significantly improve user experience is real-time streaming, seeing responses form token by token rather than waiting for complete outputs.

This post demonstrates how to implement streaming capabilities on top of DeepAgents' package with multi-agent setup, with practical code examples and architectural patterns you can apply to your own projects.

Full Implementation: The complete source code for this implementation is available at github.com/dtunai/streaming-deepagents. See particularly
plaintext
streaming_deep_agents.py
for the core streaming architecture.

Example CLI of our deepagents

What Are Deep Agents?

Deep Agents Architecture

Deep Agents move beyond simple LLM wrappers to create sophisticated multi-agent systems. The architecture enables:

Task Planning: Breaking down complex requests into manageable subtasks
Hierarchical Delegation: Assigning work to specialized sub-agents
Context Management: Maintaining state across operations via middleware
Parallel Execution: Running multiple agents simultaneously

This implementation guide focuses on adding real-time streaming to this architecture.

Implementing the Streaming Layer

The key difference between traditional and streaming agents lies in how responses are handled:

python
# Traditional approach - blocking
agent = create_deep_agent(...)
result = agent.run(task)  # Wait for complete response
print(result)

# Streaming approach - async generator
agent = StreamingDeepAgent(role=AgentRole.COORDINATOR)
async for token in agent.stream_response(task):
    print(token, end="", flush=True)  # Display each token immediately

This pattern provides immediate feedback and allows users to see the agent's reasoning unfold in real-time.

Architecture Overview

On purpose, our implementation combines agent specialization, task delegation and real-time streaming. When processing a complex task, the flow looks like this:

plaintext
User: "Build a REST API with authentication and rate limiting"
         ↓
[Coordinator Agent] → Breaks down into subtasks
         ↓
    ┌────┴────┬──────────┬──────────┐
    ↓         ↓          ↓          ↓
Researcher  Coder   Documenter  Reviewer
(Best       (FastAPI  (API       (Security
practices)   impl.)    docs)      check)

Each agent has a specific role and optimized parameters:

python
class StreamingDeepAgent:
    def __init__(self, role: AgentRole, temperature: float):
        # Coder uses low temperature (0.3) for consistency
        # Researcher uses higher (0.7) for exploration
        # Each agent optimized for its specific task

A Real Example: Building a REST API

Streaming Deep Agents Demo

The core insight is that complex tasks naturally decompose into specialized subtasks. Instead of forcing a single model to handle everything, we can create a system where:

A Coordinator agent breaks down complex requests into subtasks
Specialized agents (Researcher, Coder, Reviewer, Documenter) handle their own domains or their own targets
Responses stream in real-time as tokens are generated
Tasks execute in parallel when dependencies allow

This isn't just about dividing work, or just making optimization — it's about matching structural agentic patterns to problem structures.

Streaming

Real-time streaming fundamentally one of the most important things that changes the user experience when interacting with agent systems and LLMs.

Instead of staring at a loading spinner for 30 seconds, users wants to see thoughts forming, code being written and ideas or proposition.

Token-Level Streaming

At the lowest level, we intercept tokens as they're generated by the language model:

python
async def stream_response(self, prompt: str) -> AsyncIterator[str]:
    # Create callback to capture tokens
    self.stream_callback = StreamingCallback()
    self.llm.callbacks = [self.stream_callback]

    # Start generation asynchronously
    generation_task = asyncio.create_task(
        self.llm.ainvoke([HumanMessage(content=prompt)])
    )

    # Stream tokens as they arrive
    async for token in self.stream_callback.get_stream():
        yield token

    await generation_task

The key here is the async generator pattern - we yield tokens immediately as they become available, rather than waiting for completion.

Async Queue for Decoupling Token Generation

Streaming callback uses an async queue to decouple token generation from consumption:

python
class StreamingCallback(AsyncCallbackHandler):
    def __init__(self):
        self.streaming_queue = asyncio.Queue()

    async def on_llm_new_token(self, token: str, **kwargs):
        await self.streaming_queue.put(token)

    async def get_stream(self) -> AsyncIterator[str]:
        while True:
            token = await self.streaming_queue.get()
            if token is None:  # End signal
                break
            yield token

This pattern ensures backpressure handling and allows for multiple consumers if needed.

Task Delegation and Parallel Execution

Task Delegation

The orchestrator's role is crucial. When a complex task arrives, it doesn't just split it randomly , it understands the semantic structure:

python
async def process_complex_task(self, main_task: str):
    # Coordinator analyzes and breaks down the task
    breakdown = await coordinator.stream_response(
        f"Break down this task into subtasks: {main_task}"
    )

    # Create subtasks based on analysis
    subtasks = self._create_subtasks(main_task)

    # Execute in parallel where possible
    results = await asyncio.gather(
        self.delegate_task(subtasks[0], "Researcher"),
        self.delegate_task(subtasks[1], "Coder"),
        self.delegate_task(subtasks[2], "Documenter")
    )

Deep Agents when they are instructed understands dependencies, research might need to complete before problem solving begins, but documentation can start in parallel with implementation.

Agent Specialization Through Prompting

Agent Specialization

Each agent has a carefully crafted system prompt that shapes its behavior:

Coordinator: Focuses on decomposition and delegation
Researcher: Prioritizes accuracy and comprehensiveness
Coder: Emphasizes clean, production-ready implementations
Reviewer: Looks for bugs, security issues and optimizations
Documenter: Creates clear, user-friendly documentation

Specialization it can also be extended to the model parameters or better context engineering. But for the parameters, code generation uses lower temperature e.g. consistency, while research uses higher temperature for creative exploration by purpose.

Practical Implications

Deepagents has several non-obvious benefits:

1. Cognitive Load Distribution

Just as humans work better in specialized teams, AI agents perform better when focused on specific domains. A coding agent doesn't need to worry about documentation style and a reviewer doesn't need to generate implementations.

2. Parallel Processing

When tasks are independent, they execute simultaneously. This isn't just about speed — it's about utilizing computational resources efficiently. While one agent researches best practices, another can already start drafting implementation templates.

3. Feedback Loops

The reviewer agent provides a natural feedback mechanism. Its analysis can trigger refinements in other agents' outputs, creating an iterative improvement cycle.

4. Transparency

Streaming responses with clear agent attribution makes the system's reasoning transparent. Users can see which agent is contributing what, understanding the thought process behind complex outputs.

Implementation Considerations

Building system with deepagents requires careful attention to several aspects:

State Management: Tasks need persistent state to track progress, dependencies, and results. We use a simple but effective Task dataclass with status tracking and timestamps.

Error Handling: Distributed systems fail in distributed ways. Each agent needs graceful degradation — if the Reviewer fails, the system should still provide the core implementation.

Rate Limiting: With multiple agents potentially making parallel API calls, rate limiting becomes critical. The system implements configurable parallel execution limits.

Token Economics: Multiple agents mean multiple API calls. The system needs to balance thoroughness with token consumption, using techniques like response summarization and selective delegation.

Interactive CLI Implementation

To make these concepts accessible, I've built a command-line interface that demonstrates streaming deep agents in action. The CLI provides both interactive and direct command modes for experimenting with the architecture.

Basic Usage

bash
# Start interactive mode
python cli.py

# Chat directly with a specific agent
python cli.py chat -a "Coder" -p "Write a binary search function"

# Process a complex task with automatic delegation
python cli.py task -p "Create a REST API with authentication"

Task Delegation in Action

When you give the CLI a complex task, you can watch the entire orchestration process:

bash
$ python cli.py task -p "Build a user authentication system"

✓ Added agent: Coordinator (coordinator)
✓ Added agent: Researcher (researcher)
✓ Added agent: Coder (coder)
✓ Added agent: Reviewer (reviewer)
✓ Added agent: Documenter (documenter)

Processing Complex Task: Build a user authentication system

Coordinator analyzing task...
[Streams task breakdown in real-time]

→ Delegating to Researcher: Research best practices...
→ Delegating to Coder: Implement authentication logic...
→ Delegating to Documenter: Create API documentation...
→ Delegating to Reviewer: Review security implementation...

The key difference from traditional CLIs is that you see everything happening in real-time — the Coordinator's analysis streams token by token, then multiple agents work in parallel, each streaming their outputs as they generate them.

Creating Custom Agents

The CLI also allows you to create agents with specific parameters:

python
# In the CLI's interactive mode
Command: add
Agent name: DataAnalyst
Agent role: researcher
Model name: gpt-4
Temperature (0.0-1.0): 0.5
Enable streaming? [y/n]: y

This flexibility lets you experiment with different agent configurations and see how temperature, role, and model selection affect the streaming outputs.

Conclusion

Building streaming deep agent systems is about recognizing that complex problems have inherent structure. By matching our computational architecture to this structure — through specialization, parallelization, and real-time feedback — we can create systems that are not just more capable, but more understandable and controllable.

Key takeaways from this implementation:

Async generators enable token-by-token streaming without blocking
Queue-based architecture decouples generation from consumption
Role specialization improves output quality by focusing agents on specific domains
Parallel execution reduces overall response time for complex tasks

The shift from monolithic to orchestrated agent systems parallels the evolution we've seen in software architecture. Just as microservices revolutionized how we build scalable applications, agent orchestration is reshaping how we build AI systems.

Want to experiment with streaming deep agents? Check out our open-source implementation: github.com/dtunai/streaming-deepagents

The repository includes a complete implementation with LangChain integration, real-time streaming, parallel task execution and a CLI for interactive experimentation. The architecture described in this post is fully implemented and ready to extend for your own use cases.