SearchSwarm

Towards Delegation Intelligence in Agentic LLMs for Long-Horizon Deep Research

Real tasks can grow almost unbounded, yet a model's context is finite. We teach agentic LLMs delegation intelligence: to decompose a long-horizon task, delegate bounded subtasks to its own subagents, and integrate their condensed, evidence-grounded results, an active form of context management that lets a single model take on far more than its context alone allows.

Delegation as active context management

A single model delegates bounded subtasks to subagents in separate contexts, which return only condensed results, keeping the main context clear.

High-quality delegation SFT data

We synthesize and release fine-tuning trajectories that teach when to delegate, how to brief a subagent, and how to verify what comes back.

30B-A3B SOTA

SearchSwarm leads every model at its scale on BrowseComp, BrowseComp-ZH, GAIA, and xbench-DeepSearch across all four benchmarks.

Results

Benchmark Comparisons

SearchSwarm is the state-of-the-art model at the 30B scale, across all four benchmarks.

BrowseComp comparison
BrowseComp-ZH comparison
GAIA comparison
xbench-DeepSearch comparison
Demo

Trajectories in Action

Real runs: watch the main agent decompose a question, delegate to subagents, and synthesize a cited final answer.

Method

SearchSwarm Framework

The main agent owns the research mainline: it decomposes the question, delegates bounded evidence-gathering to subagents, and integrates the condensed, source-grounded reports they return.

SearchSwarm architecture and execution flow

SearchSwarm at a glance. The main agent dispatches bounded subtasks to subagents that run in their own fresh contexts and return condensed, cited reports, which re-enter the main agent's context for verification and synthesis.

1Encourage delegationEvery token spent on raw retrieval is one not spent on reasoning, so the harness pushes the main agent to delegate multi-step gathering and reserve its context for decomposition, verification, and synthesis.
2Comprehensive briefingThe main agent briefs each subagent like a new collaborator: not just the subtask, but why it matters, what is already established, and what is still uncertain, so it works on target.
3Main agent retains core judgmentSubagents gather; the main agent decides. It checks each finding against its sources, adjudicates conflicts, and chooses which hypotheses to pursue or drop.
4Citation-grounded reportingEvery subagent conclusion carries inline citations to its sources, and the main agent propagates them into a final answer whose explanation is traceable end to end.
Leaderboard

Performance Table

Baseline numbers are taken from the respective technical reports or model cards; an asterisk (*) marks results that use context management.

ModelSizeBrowseCompBrowseComp-ZHGAIAxbench-DeepSearch-2505
Closed-source models
GPT-5.2-Thinking--65.876.1----
GPT-5--54.965.076.477.8
Claude-4.5-Opus--67.862.471.5--
Claude-4.5-Sonnet--24.142.466.066.5
Gemini-3.0-Pro--59.266.874.8--
Seed-2.0-Pro--77.3*82.4*78.6--
Open-source models
Kimi-K2.51T-A32B78.4*------
GLM-4.7355B-A32B67.5*66.6*--72.0
GLM-5.0744B-A40B75.9*72.7*----
DeepSeek V3.2671B-A37B67.6*65.0*75.178.0
LongCat-Flash-Thinking-2601560B-A27B73.1*77.7*----
MiniMax-M2230B-A10B44.0--75.772.0
MiniMax-M2.5230B-A10B76.3*------
Step-3.5-Flash196B-A11B69.0*66.984.583.7
Open-source lightweight models
Tongyi DeepResearch30B-A3B43.446.770.975.0
Tongyi DR Swarm30B-A3B≈43.4≈46.7≈70.9≈75.0
RedSearcher30B-A3B57.4*58.2*80.1--
LongSeeker30B-A3B61.5*62.5*77.7*78.0*
MiroThinker-1.5-mini30B-A3B56.1*66.8*72.0*73.1*
MiroThinker-1.7-mini30B-A3B67.9*72.3*80.3*--
SearchSwarm (Ours)30B-A3B68.1*73.3*82.5*80.8*
Generalization

Open-Ended Deep Research

Trained only on short-answer queries, SearchSwarm still transfers to long-form, multi-source synthesis.

ModelScholarQA-v2HealthBenchResearchQADeepResearchBenchAverage
Closed-source systems
OpenAI DeepResearch79.653.879.246.964.9
Perplexity DeepResearch67.3--75.342.3--
Gemini-3.1-Pro + search--47.574.544.4--
Open-source models
Qwen3-8B40.416.556.133.336.6
QwQ-32B41.924.560.940.341.9
Tongyi DeepResearch46.546.266.740.650.0
WebThinker-32B-DPO46.739.474.240.650.2
Dr.Tulu88.352.875.745.465.6
SearchSwarm (Ours)79.252.880.244.464.2