Agentic AI Systems · Research & Knowledge Work

Deep Researcher

A seven-agent autonomous research pipeline where an LLM-driven orchestrator controls all execution flow — deciding when to search, when to stop, and when to write — without any hardcoded logic.

Architecture OpenAI SDK · Fully Agentic
Tech Stack
OpenAI SDK Multi-Agent Gradio Agentic AI SendGrid Pydantic Python

7

Specialized Agents

3

Max Research Rounds

1000+

Word Structured Report

The Problem

Fixed pipelines don't match how research actually works

Most research tools follow a hardcoded pipeline — search N times, compile results, write a report. This works when the query is well-defined and the first round of searches covers the topic adequately. But real research doesn't work that way. Some queries need clarification before a single search is made. Some topics need three rounds of targeted investigation before coverage is sufficient. Others are answered after one pass. A hardcoded pipeline treats all three cases identically — wasting compute on simple queries and producing shallow results on complex ones.

The Solution

A reasoning loop that decides when it's done

Deep Researcher replaces the hardcoded pipeline with a genuine multi-agent reasoning loop. An OrchestratorAgent powered by an LLM of choice owns all control flow decisions — calling specialized sub-agents as tools, evaluating research quality after each round, and deciding autonomously when to stop and write. The system begins by asking the user targeted clarifying questions, then enters an adaptive research loop that runs for 1 to 3 rounds based on coverage and depth scores, and delivers a structured markdown report directly to the user's email.

Key Outcome

The system produces a 1,000+ word structured research report — including executive summary, main body, conclusions, and follow-up questions — delivered by email, with all execution decisions made by the LLM rather than Python code.

Technical Deep Dive

Architecture & Design

Pipeline Flow

Phase 1 — Clarification

Pre-loop · Human-in-the-loop

ClarificationAgent

Surfaces up to 4 scoping questions · Answers injected into all downstream prompts

Phase 2 — Research Loop · up to 3 rounds

Orchestrator · LLM-driven control flow

OrchestratorAgent

Calls sub-agents as tools · Reads coverage + depth scores · Decides when to stop

function_tool · Round 1–3

PlannerAgent

function_tool · parallel ×5

SearchAgent

function_tool · scores 1–10

EvaluatorAgent

Coverage ≥ 7 + Depth ≥ 6 → hand off · Otherwise → next round · MAX_ROUNDS = 3

Phase 3 — Writing · native SDK handoff

Handoff target · Terminal agent · Owns final_output

WriterAgent

Produces ReportData · short_summary + markdown_report + follow_up_questions

Phase 4 — Email · separate Runner.run() call

Direct call · Outside handoff chain

EmailAgent

Called after report captured · Converts to HTML · Delivers via SendGrid

Phase 1

Clarification

Before any research begins, the ClarificationAgent analyses the query and surfaces up to 4 targeted questions in the Gradio UI. The user's answers are injected into every downstream agent prompt via an enriched query — ensuring the research stays focused on the right scope, depth, and audience from the start.

Phase 2

Research Loop

The OrchestratorAgent runs an adaptive tool-driven loop. Each iteration calls PlannerAgent to generate 5 targeted queries, SearchAgent to fan all queries out in parallel via asyncio.gather(), and EvaluatorAgent to score the accumulated results on coverage and depth (1–10). The Orchestrator reads these scores and decides whether to run another round or stop — no Python if-statement controls this decision.

Phase 3

Writing

When research is judged sufficient — coverage ≥ 7 and depth ≥ 6, or after MAX_ROUNDS = 3 — the OrchestratorAgent performs a native SDK handoff to the WriterAgent. The WriterAgent becomes the terminal agent, owns final_output, and produces a structured ReportData Pydantic model containing a short summary, full markdown report, and follow-up questions.

Phase 4

Email Delivery

After the report is captured, EmailAgent is called separately via Runner.run() — deliberately outside the handoff chain to prevent it from overwriting the WriterAgent's final_output. This keeps the WriterAgent as the terminal agent and preserves report integrity at all times.

Key Design Decisions

Handoff topology determines final_output ownership

Chaining WriterAgent → EmailAgent via native handoff causes EmailAgent's confirmation string to become final_output, overwriting the report. By keeping EmailAgent outside the handoff chain and calling it explicitly after capturing the report, WriterAgent remains terminal and report integrity is always preserved.

max_turns budget requires careful calculation

Each sub-agent call inside an @function_tool counts against the parent run's turn budget. Five parallel searches consume approximately 10 turns. With 3 rounds of planning, searching, and evaluation plus the handoff and write phase, the realistic budget is ~42 turns. max_turns=50 provides a safe buffer without over-allocating.

Explicit sufficiency rules prevent over-research

The EvaluatorAgent uses explicit scoring thresholds: sufficient if coverage ≥ 7 AND depth ≥ 6, or if coverage ≥ 8 regardless of depth, never sufficient if coverage < 5. On round 3, the agent is instructed to acknowledge diminishing returns — preventing unnecessary additional rounds on topics where further searching adds marginal value.

Tech Stack

Technology Purpose
OpenAI Agents SDK Agent orchestration, native handoffs, streaming
GPT-4o-mini LLM backbone for all 7 agents
WebSearchTool Real-time web search within SearchAgent
Gradio Three-phase interactive UI with live progress streaming
SendGrid HTML email delivery of completed reports
Pydantic Structured output validation across all agents
asyncio Parallel search execution via asyncio.gather()
Python Core language

Results & Metrics

What the system delivers

7

Specialized Agents

Each independently testable and replaceable — from ClarificationAgent to EmailAgent

15

Parallel Searches / Run

Up to 5 queries per round fanned out concurrently via asyncio.gather() across 3 rounds

1000+

Word Structured Report

Executive summary, main body, conclusions, and follow-up questions delivered by email

🔁

Adaptive stopping

Research terminates when quality is sufficient, not after a fixed number of steps. Simple queries finish in one round; complex ones use all three.

🔍

Gap-aware planning

From round 2 onwards, PlannerAgent receives identified gaps from EvaluatorAgent and generates queries targeting unexplored angles rather than re-searching covered ground.

📊

Full execution traceability

Every agent decision is visible as a graph in the OpenAI traces dashboard — tool calls, handoffs, coverage scores, and reasoning are all logged in real time.

💬

Human-in-the-loop scoping

ClarificationAgent surfaces up to 4 targeted questions before research begins — covering scope, depth, format, audience, and timeframe — ensuring the pipeline researches the right problem from the start.

Agent Roster

Agent Responsibility
ClarificationAgent Determines if query needs scoping; surfaces up to 4 questions
OrchestratorAgent Drives the research loop; calls tools; decides when to stop
PlannerAgent Generates 5 targeted search queries; gap-aware on rounds 2+
SearchAgent Performs a single web search; returns 300-word summary
EvaluatorAgent Scores coverage & depth 1–10; identifies gaps; decides sufficiency
WriterAgent Writes full markdown report from accumulated search results
EmailAgent Converts report to HTML; sends via SendGrid