Agentic AI Systems · Research & Knowledge Work

Deep Researcher

A seven-agent autonomous research pipeline where an LLM-driven orchestrator controls all execution flow — deciding when to search, when to stop, and when to write — without any hardcoded logic.

Architecture OpenAI SDK · Fully Agentic

Tech Stack

OpenAI SDK Multi-Agent Gradio Agentic AI SendGrid Pydantic Python

Source Code View on GitHub

Specialized Agents

Max Research Rounds

1000+

Word Structured Report

The Problem

Fixed pipelines don't match how research actually works

Most research tools follow a hardcoded pipeline — search N times, compile results, write a report. This works when the query is well-defined and the first round of searches covers the topic adequately. But real research doesn't work that way. Some queries need clarification before a single search is made. Some topics need three rounds of targeted investigation before coverage is sufficient. Others are answered after one pass. A hardcoded pipeline treats all three cases identically — wasting compute on simple queries and producing shallow results on complex ones.

The Solution

A reasoning loop that decides when it's done

Deep Researcher replaces the hardcoded pipeline with a genuine multi-agent reasoning loop. An OrchestratorAgent powered by GPT-4o-mini owns all control flow decisions — calling specialized sub-agents as tools, evaluating research quality after each round, and deciding autonomously when to stop and write. The system first analyses the query and, if needed, asks the user targeted clarifying questions, then enters an adaptive research loop that runs for 1 to 3 rounds based on coverage and depth scores, and displays a structured markdown report in the UI with optional delivery to the user's email.

Key Outcome

The system produces a 1,000+ word structured research report — including executive summary, main body, conclusions, and follow-up questions — displayed directly in the UI and optionally delivered by email, with all execution decisions made by the LLM rather than Python code.

Technical Deep Dive

Architecture & Design

Pipeline Flow

Phase 1 — Clarification

Pre-loop · Human-in-the-loop

ClarificationAgent

Surfaces up to 4 scoping questions · Answers injected into all downstream prompts

▼

Phase 2 — Research Loop · up to 3 rounds

Orchestrator · LLM-driven control flow

OrchestratorAgent

Calls sub-agents as tools · Reads coverage + depth scores · Decides when to stop

function_tool · Round 1–3

PlannerAgent

function_tool · parallel ×5

SearchAgent

function_tool · scores 1–10

EvaluatorAgent

Coverage ≥ 7 & Depth ≥ 6, or Coverage ≥ 8 alone → hand off · Otherwise → next round · MAX_ROUNDS = 3

▼

Phase 3 — Writing · native SDK handoff

Handoff target · Terminal agent · Owns final_output

WriterAgent

Produces ReportData · short_summary + markdown_report + follow_up_questions

▼

Phase 4 — Email · separate Runner.run() call

Direct call · Outside handoff chain

EmailAgent

Called after report captured · Converts to HTML · Delivers via SendGrid

Phase 1

Clarification

Before any research begins, the ClarificationAgent analyses the query and surfaces up to 4 targeted questions in the Gradio UI. The user's answers are injected into every downstream agent prompt via an enriched query — ensuring the research stays focused on the right scope, depth, and audience from the start.

Phase 2

Research Loop

The OrchestratorAgent runs an adaptive tool-driven loop. Each iteration calls PlannerAgent to generate 5 targeted queries, SearchAgent to fan all queries out in parallel via asyncio.gather(), and EvaluatorAgent to score the accumulated results on coverage and depth (1–10). The Orchestrator reads these scores and decides whether to run another round or stop — no Python if-statement controls this decision.

Phase 3

Writing

When research is judged sufficient — coverage ≥ 7 and depth ≥ 6, or coverage ≥ 8 alone, or after MAX_ROUNDS = 3 — the OrchestratorAgent performs a native SDK handoff to the WriterAgent. The WriterAgent becomes the terminal agent, owns final_output, and produces a structured ReportData Pydantic model containing a short summary, full markdown report, and follow-up questions.

Phase 4

Email Delivery

After the report is captured, EmailAgent is called separately via Runner.run() — deliberately outside the handoff chain to prevent it from overwriting the WriterAgent's final_output. This keeps the WriterAgent as the terminal agent and preserves report integrity at all times.

Key Design Decisions

Handoff topology determines final_output ownership

Chaining WriterAgent → EmailAgent via native handoff causes EmailAgent's confirmation string to become final_output, overwriting the report. By keeping EmailAgent outside the handoff chain and calling it explicitly after capturing the report, WriterAgent remains terminal and report integrity is always preserved.

max_turns budget requires careful calculation

Sub-agent calls made via Runner.run() inside a @function_tool create independent run contexts and do not consume turns from the parent orchestrator's budget. From the orchestrator's perspective, the entire run_searches call — regardless of how many parallel searches happen inside — costs exactly 1 turn. Each round therefore costs 3 turns (one per tool call: plan_searches, run_searches, evaluate_sufficiency). Three full rounds plus the handoff and WriterAgent synthesis total roughly 15–20 turns. max_turns=50 is an intentionally conservative safety margin.

Explicit sufficiency rules prevent over-research

The EvaluatorAgent uses explicit scoring thresholds: sufficient if coverage ≥ 7 AND depth ≥ 6, or if coverage ≥ 8 regardless of depth, never sufficient if coverage < 5. On round 3, the agent is instructed to acknowledge diminishing returns — preventing unnecessary additional rounds on topics where further searching adds marginal value.

Tech Stack

Technology	Purpose
OpenAI Agents SDK	Agent orchestration, native handoffs, streaming
GPT-4o-mini	LLM backbone for all 7 agents
WebSearchTool	Real-time web search within SearchAgent
Gradio	Three-phase interactive UI with live progress streaming
SendGrid	HTML email delivery of completed reports
Pydantic	Structured output validation across all agents
asyncio	Parallel search execution via asyncio.gather()
Python	Core language

Results & Metrics

What the system delivers

Specialized Agents

Each independently testable and replaceable — from ClarificationAgent to EmailAgent

Parallel Searches / Run

Up to 5 queries per round fanned out concurrently via asyncio.gather() across 3 rounds

1000+

Word Structured Report

Executive summary, main body, conclusions, and follow-up questions delivered by email

🔁

Adaptive stopping

Research terminates when quality is sufficient, not after a fixed number of steps. Simple queries finish in one round; complex ones use all three.

🔍

Gap-aware planning

From round 2 onwards, PlannerAgent receives identified gaps from EvaluatorAgent and generates queries targeting unexplored angles rather than re-searching covered ground.

📊

Full execution traceability

Every agent decision is visible as a graph in the OpenAI traces dashboard — tool calls, handoffs, coverage scores, and reasoning are all logged in real time.

💬

Human-in-the-loop scoping

ClarificationAgent surfaces up to 4 targeted questions before research begins — covering scope, depth, format, audience, and timeframe — ensuring the pipeline researches the right problem from the start.

Agent Roster

Agent	Responsibility
ClarificationAgent	Determines if query needs scoping; surfaces up to 4 questions
OrchestratorAgent	Drives the research loop; calls tools; decides when to stop
PlannerAgent	Generates 5 targeted search queries; gap-aware on rounds 2+
SearchAgent	Performs a single web search; returns 300-word summary
EvaluatorAgent	Scores coverage & depth 1–10; identifies gaps; decides sufficiency
WriterAgent	Writes full markdown report from accumulated search results
EmailAgent	Converts report to HTML; sends via SendGrid

← Back to Agentic AI Systems

Deep Researcher (User-Guided)

OpenAI SDK · Workflow