Overview¶
LINK-Researcher is a LangGraph-powered multi-agent workflow designed by Nanjing University and Alibaba Group for shipping complex research tasks end-to-end. Instead of a single-turn Q&A bot, it understands the task, produces a plan, collects evidence in parallel, writes sections in parallel, then post-processes, validates, and packages deliverables.
At an engineering level, it separates responsibilities across nodes so you can quickly diagnose where quality issues occur: planning, retrieval, generation, or validation.
PROJECT POSITIONING
Multi-Agent Research Workflow
The goal is not “to sound right”, but “to deliver reliably”: make long-chain research converge into shippable artifacts in an observable, configurable, and extensible workflow.
What problem does LINK-Researcher solve?¶
Research delivery is hard because inputs are messy, evidence is scattered, the chain is long, and output requirements are strict. This project breaks “long-chain uncertainty” into controllable stages:
- Decomposition: break large questions into executable steps to avoid one-shot instability
- Parallelism: parallelize collection and writing to improve throughput
- Convergence: each stage has clear responsibility; validation and packaging close the loop
- Observability: memory + queue events expose intermediate states for debugging and iteration
From input to deliverable: the main workflow¶
The core execution chain is defined by build_graph() in src/graph/base_graph.py:
START
-> role_play
-> planner
-> (perception | page_replan)
-> data_collection (parallel)
-> init_design_guide
-> init_format
-> format (parallel)
-> post_process
-> validation
-> zip_data
-> END
You can think of it as four stages:
- Task orientation:
role_play/plannerclarifies goals, roles, and steps. - Evidence building:
perception/page_replan+data_collectionturn inputs and external info into usable evidence. - Content generation:
init_format+formattransform evidence into structured sections. - Delivery hardening:
post_process,validation,zip_dataimprove consistency and produce shippable artifacts.
Why this architecture “runs reliably”¶
Graph as Contract
Nodes define stage boundaries, reducing drift in long generation chains.
Parallel as Default
Subtasks are dispatched via Send() to shorten wall-clock latency.
Tools as Capability Layer
Search, fetch, and file ops are injected explicitly as tools, not implicitly coupled.
Memory as Continuity
Plans and intermediate results accumulate across stages to support convergence.
Mode as Trade-off
FAST_MODE and related toggles formalize speed-vs-quality trade-offs.
Validation as Guardrail
Post-processing and validation add consistency checks before packaging.
Recommended reading order¶
Overview: build a mental model and the key trade-offsQuick Start: run an end-to-end workflow with minimal setupArchitecture: dive into modules, topology, and sequence diagramsReference: references and upstream docs