Deep Researcher (User-Guided)
A four-agent deterministic research pipeline where Python code controls all execution flow — delivering predictable, traceable research automation from query to emailed report in a fixed sequence of steps.
6
Specialized Agents
5
Parallel Searches / Run
1000+
Word Structured Report
The Problem
Autonomous agents are powerful but hard to predict
A simple workflow always runs the same searches regardless of how complex or vague the query is. It can't tell when results are shallow, can't ask the user what they actually mean, and can't decide to search deeper when an important angle was missed. The result is inconsistent report quality — thorough on easy topics, superficial on hard ones — with no mechanism to close the gap.
The Solution
A fixed pipeline where Python owns the control flow
Deep Researcher — User-Guided wraps a Python-controlled research loop around six specialized agents. Before any searching begins, ClarificationAgent decides whether the query needs narrowing and surfaces up to four targeted questions in the UI — skipping this step entirely for well-scoped queries. The ResearchOrchestrator then runs up to three rounds of research: PlannerAgent generates five targeted search queries (incorporating gap descriptions from the previous round on subsequent passes), SearchAgent fans all five out in parallel via asyncio, and EvaluatorAgent scores the accumulated results on coverage and depth — both on a 1–10 scale. The loop exits as soon as coverage ≥ 7 and depth ≥ 6; otherwise the evaluator's identified gaps are fed back to the planner and another round runs. Once the bar is met, WriterAgent synthesises everything into a 1,000+ word structured report, and EmailAgent optionally delivers it as styled HTML via SendGrid.
Key Outcome
Rather than choosing between the reliability of a fixed workflow and the capability of a fully autonomous agent, Deep Researcher — User-Guided demonstrates a third path: Python owns all control flow and sequencing, but an explicit evaluation loop adjusts how many rounds of searching actually run. Simple queries finish in one pass; complex topics get the rounds they need, up to a hard cap of three. The result is consistently higher-quality reports than a fixed pipeline, without the unpredictability of handing control to the LLM.
Technical Deep Dive
Architecture & Design
Step 1
Clarify
ResearchOrchestrator.get_clarifications() calls ClarificationAgent with the raw user query. The agent decides whether the query is already well-scoped or needs narrowing, and if so returns a ClarificationPlan containing up to four targeted questions. The Gradio UI surfaces these questions for the user to answer before any searching begins — for clear queries, this step is skipped entirely.
Step 2
Plan
The orchestrator calls PlannerAgent with the enriched query — the original query plus any clarification answers. On the first round the agent generates 5 targeted, non-overlapping search queries. On subsequent rounds it also receives the gaps and suggested queries identified by the EvaluatorAgent, so new searches fill holes rather than re-cover ground already retrieved. Output is a validated WebSearchPlan Pydantic schema.
Step 3
Search
ResearchOrchestrator._run_searches() fans out all 5 queries in parallel using asyncio.create_task(). Each query spawns an independent SearchAgent instance with WebSearchTool set to tool_choice='required' — ensuring the web search is always executed. Each agent returns a 300-word summary of its findings. A failed search is silently dropped; the remaining results proceed without blocking the pipeline.
Step 4
Evaluate
After each search round, EvaluatorAgent scores the accumulated results on two dimensions: coverage (1–10, breadth of topic angles addressed) and depth (1–10, degree of source corroboration). If coverage ≥ 7 AND depth ≥ 6, the loop exits and writing begins. Otherwise the evaluator identifies specific gaps and suggests new queries, and the loop returns to Step 2. A hard cap of 3 rounds prevents runaway searching. Output is a validated SufficiencyEvaluation Pydantic schema.
Step 5
Write
Once the evaluator is satisfied (or 3 rounds are exhausted), the orchestrator passes the enriched query and all accumulated search summaries to WriterAgent. The agent produces an outline then writes a full structured markdown report targeting 1,000+ words across 5–10 sections. Output is a validated ReportData Pydantic model containing short_summary, markdown_report, and follow_up_questions.
Step 6
Email (Optional)
If the user checked "Send report by email" in the UI, the orchestrator passes the markdown report to EmailAgent. The agent converts it to styled HTML — max 700px wide, Georgia serif font, proper heading hierarchy — and delivers it via the send_email function_tool backed by the SendGrid API. This step is skipped entirely when the checkbox is left unchecked.
Key Design Decisions
Python owns the control flow — not the LLM
The ResearchOrchestrator class manages all agents via explicit Runner.run() calls. Results are passed between stages as typed Python objects — there is no agent-to-agent communication and no LLM deciding what happens next. The loop condition, early-exit logic, and handoff to the writer are all Python code, making the pipeline predictable, debuggable, and easy to extend.
Sufficiency evaluation gates progression — not a fixed round count
Rather than always running a fixed number of rounds, the EvaluatorAgent's scores (coverage ≥ 7 AND depth ≥ 6) determine when to stop. Simple queries exit after round one; complex topics use all three rounds. A hard cap of 3 prevents runaway compute while still allowing quality to drive the decision on typical queries.
Clarification before search reduces wasted retrieval
Asking one or two focused questions upfront is cheaper than retrieving off-target results across multiple rounds. The ClarificationAgent is selective — it skips questions entirely when the query is already well-scoped — so the step adds latency only when it genuinely changes the research direction.
Parallel search execution via asyncio
All 5 search queries run concurrently using asyncio.create_task(), reducing total search time to the duration of the slowest single search rather than the sum of all five. Each SearchAgent instance is independent — a failed search is silently dropped and does not block the others from completing.
Tech Stack
| Technology | Purpose |
|---|---|
| OpenAI Agents SDK | Agent definitions, Runner.run() calls, WebSearchTool, and tracing |
| GPT-4o-mini | LLM backbone for all 6 agents |
| WebSearchTool | Real-time web search within SearchAgent |
| Gradio | Two-phase UI: query analysis then live-streaming research progress and report output |
| Pydantic | Structured output validation for ClarificationPlan, WebSearchPlan, SufficiencyEvaluation, and ReportData |
| SendGrid | HTML email delivery of completed reports (optional) |
| asyncio | Parallel search execution via asyncio.create_task() |
| Python | Core language and pipeline orchestration |
Results & Metrics
What the system delivers
6
Specialized Agents
Clarification, Planner, Search, Evaluator, Writer, and Email — each with a single well-defined responsibility in the pipeline
5
Parallel Searches / Round
All 5 queries execute concurrently via asyncio per round — up to 3 rounds run before the evaluator confirms sufficient coverage
1000+
Word Structured Report
Executive summary, full markdown report, and follow-up questions — optionally delivered by email via SendGrid
Adaptive multi-round research
After each round the EvaluatorAgent scores coverage (1–10) and depth (1–10) against the accumulated results. The loop exits early when both thresholds are met, or runs up to 3 rounds — so simple queries finish in one pass while complex topics get the depth they need.
Easy to debug and extend
Because control flow lives in Python rather than LLM reasoning, every stage can be inspected, logged, and modified independently. Adding a new pipeline step is as simple as writing a new ResearchOrchestrator method.
Minimal-touch delivery
After answering any clarifying questions upfront, the pipeline requires no further input — research, evaluation, synthesis, formatting, and optional email delivery all run without intervention. The clarification step is skipped entirely for well-scoped queries.
Quality-gated progression
The pipeline never writes a report until the research meets an explicit quality bar — unlike fixed-round workflows that write regardless of what was found. This makes the User-Guided version the natural architectural counterpart to the simpler Workflow baseline, demonstrating the measurable impact of adding evaluation loops.
Agent Roster
| Agent | Responsibility |
|---|---|
| ClarificationAgent | Decides whether the query needs scoping; surfaces up to 4 targeted questions for the user to answer before searching begins |
| PlannerAgent | Generates 5 targeted web search queries per round; incorporates gap descriptions from the EvaluatorAgent on subsequent rounds |
| SearchAgent | Performs a single web search; returns a 300-word summary (5 instances run in parallel per round) |
| EvaluatorAgent | Scores accumulated results on coverage and depth (both 1–10); triggers early exit or identifies gaps for the next round |
| WriterAgent | Synthesises all accumulated search results into a structured 1,000+ word markdown report with executive summary and follow-up questions |
| EmailAgent | Converts markdown report to styled HTML; delivers via SendGrid (runs only when the user opts in) |