Agentic AI Systems · Research & Knowledge Work

Deep Researcher (User-Guided)

A four-agent deterministic research pipeline where Python code controls all execution flow — delivering predictable, traceable research automation from query to emailed report in a fixed sequence of steps.

Architecture OpenAI SDK · Workflow

Tech Stack

OpenAI SDK Workflow AI Gradio Multi-Agent SendGrid Pydantic Python

Source Code View on GitHub

Specialized Agents

Parallel Searches / Run

1000+

Word Structured Report

The Problem

Autonomous agents are powerful but hard to predict

A simple workflow always runs the same searches regardless of how complex or vague the query is. It can't tell when results are shallow, can't ask the user what they actually mean, and can't decide to search deeper when an important angle was missed. The result is inconsistent report quality — thorough on easy topics, superficial on hard ones — with no mechanism to close the gap.

The Solution

A fixed pipeline where Python owns the control flow

Deep Researcher — User-Guided wraps a Python-controlled research loop around six specialized agents. Before any searching begins, ClarificationAgent decides whether the query needs narrowing and surfaces up to four targeted questions in the UI — skipping this step entirely for well-scoped queries. The ResearchOrchestrator then runs up to three rounds of research: PlannerAgent generates five targeted search queries (incorporating gap descriptions from the previous round on subsequent passes), SearchAgent fans all five out in parallel via asyncio, and EvaluatorAgent scores the accumulated results on coverage and depth — both on a 1–10 scale. The loop exits as soon as coverage ≥ 7 and depth ≥ 6; otherwise the evaluator's identified gaps are fed back to the planner and another round runs. Once the bar is met, WriterAgent synthesises everything into a 1,000+ word structured report, and EmailAgent optionally delivers it as styled HTML via SendGrid.

Key Outcome

Rather than choosing between the reliability of a fixed workflow and the capability of a fully autonomous agent, Deep Researcher — User-Guided demonstrates a third path: Python owns all control flow and sequencing, but an explicit evaluation loop adjusts how many rounds of searching actually run. Simple queries finish in one pass; complex topics get the rounds they need, up to a hard cap of three. The result is consistently higher-quality reports than a fixed pipeline, without the unpredictability of handing control to the LLM.

Technical Deep Dive

Architecture & Design

Step 1

Clarify

ResearchOrchestrator.get_clarifications() calls ClarificationAgent with the raw user query. The agent decides whether the query is already well-scoped or needs narrowing, and if so returns a ClarificationPlan containing up to four targeted questions. The Gradio UI surfaces these questions for the user to answer before any searching begins — for clear queries, this step is skipped entirely.

Step 2

Plan

The orchestrator calls PlannerAgent with the enriched query — the original query plus any clarification answers. On the first round the agent generates 5 targeted, non-overlapping search queries. On subsequent rounds it also receives the gaps and suggested queries identified by the EvaluatorAgent, so new searches fill holes rather than re-cover ground already retrieved. Output is a validated WebSearchPlan Pydantic schema.

Step 3

Search

ResearchOrchestrator._run_searches() fans out all 5 queries in parallel using asyncio.create_task(). Each query spawns an independent SearchAgent instance with WebSearchTool set to tool_choice='required' — ensuring the web search is always executed. Each agent returns a 300-word summary of its findings. A failed search is silently dropped; the remaining results proceed without blocking the pipeline.

Step 4

Evaluate

After each search round, EvaluatorAgent scores the accumulated results on two dimensions: coverage (1–10, breadth of topic angles addressed) and depth (1–10, degree of source corroboration). If coverage ≥ 7 AND depth ≥ 6, the loop exits and writing begins. Otherwise the evaluator identifies specific gaps and suggests new queries, and the loop returns to Step 2. A hard cap of 3 rounds prevents runaway searching. Output is a validated SufficiencyEvaluation Pydantic schema.

Step 5

Write

Once the evaluator is satisfied (or 3 rounds are exhausted), the orchestrator passes the enriched query and all accumulated search summaries to WriterAgent. The agent produces an outline then writes a full structured markdown report targeting 1,000+ words across 5–10 sections. Output is a validated ReportData Pydantic model containing short_summary, markdown_report, and follow_up_questions.

Step 6

Email (Optional)

If the user checked "Send report by email" in the UI, the orchestrator passes the markdown report to EmailAgent. The agent converts it to styled HTML — max 700px wide, Georgia serif font, proper heading hierarchy — and delivers it via the send_email function_tool backed by the SendGrid API. This step is skipped entirely when the checkbox is left unchecked.

Key Design Decisions

Python owns the control flow — not the LLM

The ResearchOrchestrator class manages all agents via explicit Runner.run() calls. Results are passed between stages as typed Python objects — there is no agent-to-agent communication and no LLM deciding what happens next. The loop condition, early-exit logic, and handoff to the writer are all Python code, making the pipeline predictable, debuggable, and easy to extend.

Sufficiency evaluation gates progression — not a fixed round count

Rather than always running a fixed number of rounds, the EvaluatorAgent's scores (coverage ≥ 7 AND depth ≥ 6) determine when to stop. Simple queries exit after round one; complex topics use all three rounds. A hard cap of 3 prevents runaway compute while still allowing quality to drive the decision on typical queries.

Clarification before search reduces wasted retrieval

Asking one or two focused questions upfront is cheaper than retrieving off-target results across multiple rounds. The ClarificationAgent is selective — it skips questions entirely when the query is already well-scoped — so the step adds latency only when it genuinely changes the research direction.

Parallel search execution via asyncio

All 5 search queries run concurrently using asyncio.create_task(), reducing total search time to the duration of the slowest single search rather than the sum of all five. Each SearchAgent instance is independent — a failed search is silently dropped and does not block the others from completing.

Tech Stack

Technology	Purpose
OpenAI Agents SDK	Agent definitions, Runner.run() calls, WebSearchTool, and tracing
GPT-4o-mini	LLM backbone for all 6 agents
WebSearchTool	Real-time web search within SearchAgent
Gradio	Two-phase UI: query analysis then live-streaming research progress and report output
Pydantic	Structured output validation for ClarificationPlan, WebSearchPlan, SufficiencyEvaluation, and ReportData
SendGrid	HTML email delivery of completed reports (optional)
asyncio	Parallel search execution via asyncio.create_task()
Python	Core language and pipeline orchestration

Results & Metrics

What the system delivers

Specialized Agents

Clarification, Planner, Search, Evaluator, Writer, and Email — each with a single well-defined responsibility in the pipeline

Parallel Searches / Round

All 5 queries execute concurrently via asyncio per round — up to 3 rounds run before the evaluator confirms sufficient coverage

1000+

Word Structured Report

Executive summary, full markdown report, and follow-up questions — optionally delivered by email via SendGrid

🔄

Adaptive multi-round research

After each round the EvaluatorAgent scores coverage (1–10) and depth (1–10) against the accumulated results. The loop exits early when both thresholds are met, or runs up to 3 rounds — so simple queries finish in one pass while complex topics get the depth they need.

🔧

Easy to debug and extend

Because control flow lives in Python rather than LLM reasoning, every stage can be inspected, logged, and modified independently. Adding a new pipeline step is as simple as writing a new ResearchOrchestrator method.

📬

Minimal-touch delivery

After answering any clarifying questions upfront, the pipeline requires no further input — research, evaluation, synthesis, formatting, and optional email delivery all run without intervention. The clarification step is skipped entirely for well-scoped queries.

🏗️

Quality-gated progression

The pipeline never writes a report until the research meets an explicit quality bar — unlike fixed-round workflows that write regardless of what was found. This makes the User-Guided version the natural architectural counterpart to the simpler Workflow baseline, demonstrating the measurable impact of adding evaluation loops.

Agent Roster

Agent	Responsibility
ClarificationAgent	Decides whether the query needs scoping; surfaces up to 4 targeted questions for the user to answer before searching begins
PlannerAgent	Generates 5 targeted web search queries per round; incorporates gap descriptions from the EvaluatorAgent on subsequent rounds
SearchAgent	Performs a single web search; returns a 300-word summary (5 instances run in parallel per round)
EvaluatorAgent	Scores accumulated results on coverage and depth (both 1–10); triggers early exit or identifies gaps for the next round
WriterAgent	Synthesises all accumulated search results into a structured 1,000+ word markdown report with executive summary and follow-up questions
EmailAgent	Converts markdown report to styled HTML; delivers via SendGrid (runs only when the user opts in)

← Back to Agentic AI Systems

← Previous

Deep Researcher

OpenAI SDK · Fully Agentic

Resume & Cover Letter Tailor

CrewAI · Sequential