Deep Researcher
A seven-agent autonomous research pipeline where an LLM-driven orchestrator controls all execution flow — deciding when to search, when to stop, and when to write — without any hardcoded logic.
7
Specialized Agents
3
Max Research Rounds
1000+
Word Structured Report
The Problem
Fixed pipelines don't match how research actually works
Most research tools follow a hardcoded pipeline — search N times, compile results, write a report. This works when the query is well-defined and the first round of searches covers the topic adequately. But real research doesn't work that way. Some queries need clarification before a single search is made. Some topics need three rounds of targeted investigation before coverage is sufficient. Others are answered after one pass. A hardcoded pipeline treats all three cases identically — wasting compute on simple queries and producing shallow results on complex ones.
The Solution
A reasoning loop that decides when it's done
Deep Researcher replaces the hardcoded pipeline with a genuine multi-agent reasoning loop. An OrchestratorAgent powered by an LLM of choice owns all control flow decisions — calling specialized sub-agents as tools, evaluating research quality after each round, and deciding autonomously when to stop and write. The system begins by asking the user targeted clarifying questions, then enters an adaptive research loop that runs for 1 to 3 rounds based on coverage and depth scores, and delivers a structured markdown report directly to the user's email.
Key Outcome
The system produces a 1,000+ word structured research report — including executive summary, main body, conclusions, and follow-up questions — delivered by email, with all execution decisions made by the LLM rather than Python code.
Technical Deep Dive
Architecture & Design
Pipeline Flow
Phase 1 — Clarification
Pre-loop · Human-in-the-loop
ClarificationAgent
Surfaces up to 4 scoping questions · Answers injected into all downstream prompts
Phase 2 — Research Loop · up to 3 rounds
Orchestrator · LLM-driven control flow
OrchestratorAgent
Calls sub-agents as tools · Reads coverage + depth scores · Decides when to stop
function_tool · Round 1–3
PlannerAgent
function_tool · parallel ×5
SearchAgent
function_tool · scores 1–10
EvaluatorAgent
Phase 3 — Writing · native SDK handoff
Handoff target · Terminal agent · Owns final_output
WriterAgent
Produces ReportData · short_summary + markdown_report + follow_up_questions
Phase 4 — Email · separate Runner.run() call
Direct call · Outside handoff chain
EmailAgent
Called after report captured · Converts to HTML · Delivers via SendGrid
Phase 1
Clarification
Before any research begins, the ClarificationAgent analyses the query and surfaces up to 4 targeted questions in the Gradio UI. The user's answers are injected into every downstream agent prompt via an enriched query — ensuring the research stays focused on the right scope, depth, and audience from the start.
Phase 2
Research Loop
The OrchestratorAgent runs an adaptive tool-driven loop. Each iteration calls PlannerAgent to generate 5 targeted queries, SearchAgent to fan all queries out in parallel via asyncio.gather(), and EvaluatorAgent to score the accumulated results on coverage and depth (1–10). The Orchestrator reads these scores and decides whether to run another round or stop — no Python if-statement controls this decision.
Phase 3
Writing
When research is judged sufficient — coverage ≥ 7 and depth ≥ 6, or after MAX_ROUNDS = 3 — the OrchestratorAgent performs a native SDK handoff to the WriterAgent. The WriterAgent becomes the terminal agent, owns final_output, and produces a structured ReportData Pydantic model containing a short summary, full markdown report, and follow-up questions.
Phase 4
Email Delivery
After the report is captured, EmailAgent is called separately via Runner.run() — deliberately outside the handoff chain to prevent it from overwriting the WriterAgent's final_output. This keeps the WriterAgent as the terminal agent and preserves report integrity at all times.
Key Design Decisions
Handoff topology determines final_output ownership
Chaining WriterAgent → EmailAgent via native handoff causes EmailAgent's confirmation string to become final_output, overwriting the report. By keeping EmailAgent outside the handoff chain and calling it explicitly after capturing the report, WriterAgent remains terminal and report integrity is always preserved.
max_turns budget requires careful calculation
Each sub-agent call inside an @function_tool counts against the parent run's turn budget. Five parallel searches consume approximately 10 turns. With 3 rounds of planning, searching, and evaluation plus the handoff and write phase, the realistic budget is ~42 turns. max_turns=50 provides a safe buffer without over-allocating.
Explicit sufficiency rules prevent over-research
The EvaluatorAgent uses explicit scoring thresholds: sufficient if coverage ≥ 7 AND depth ≥ 6, or if coverage ≥ 8 regardless of depth, never sufficient if coverage < 5. On round 3, the agent is instructed to acknowledge diminishing returns — preventing unnecessary additional rounds on topics where further searching adds marginal value.
Tech Stack
| Technology | Purpose |
|---|---|
| OpenAI Agents SDK | Agent orchestration, native handoffs, streaming |
| GPT-4o-mini | LLM backbone for all 7 agents |
| WebSearchTool | Real-time web search within SearchAgent |
| Gradio | Three-phase interactive UI with live progress streaming |
| SendGrid | HTML email delivery of completed reports |
| Pydantic | Structured output validation across all agents |
| asyncio | Parallel search execution via asyncio.gather() |
| Python | Core language |
Results & Metrics
What the system delivers
7
Specialized Agents
Each independently testable and replaceable — from ClarificationAgent to EmailAgent
15
Parallel Searches / Run
Up to 5 queries per round fanned out concurrently via asyncio.gather() across 3 rounds
1000+
Word Structured Report
Executive summary, main body, conclusions, and follow-up questions delivered by email
Adaptive stopping
Research terminates when quality is sufficient, not after a fixed number of steps. Simple queries finish in one round; complex ones use all three.
Gap-aware planning
From round 2 onwards, PlannerAgent receives identified gaps from EvaluatorAgent and generates queries targeting unexplored angles rather than re-searching covered ground.
Full execution traceability
Every agent decision is visible as a graph in the OpenAI traces dashboard — tool calls, handoffs, coverage scores, and reasoning are all logged in real time.
Human-in-the-loop scoping
ClarificationAgent surfaces up to 4 targeted questions before research begins — covering scope, depth, format, audience, and timeframe — ensuring the pipeline researches the right problem from the start.
Agent Roster
| Agent | Responsibility |
|---|---|
| ClarificationAgent | Determines if query needs scoping; surfaces up to 4 questions |
| OrchestratorAgent | Drives the research loop; calls tools; decides when to stop |
| PlannerAgent | Generates 5 targeted search queries; gap-aware on rounds 2+ |
| SearchAgent | Performs a single web search; returns 300-word summary |
| EvaluatorAgent | Scores coverage & depth 1–10; identifies gaps; decides sufficiency |
| WriterAgent | Writes full markdown report from accumulated search results |
| EmailAgent | Converts report to HTML; sends via SendGrid |