Overview¶

LINK-Researcher is a LangGraph-powered multi-agent workflow designed by Nanjing University and Alibaba Group for shipping complex research tasks end-to-end. Instead of a single-turn Q&A bot, it understands the task, produces a plan, collects evidence in parallel, writes sections in parallel, then post-processes, validates, and packages deliverables.

At an engineering level, it separates responsibilities across nodes so you can quickly diagnose where quality issues occur: planning, retrieval, generation, or validation.

PROJECT POSITIONING

Multi-Agent Research Workflow

The goal is not “to sound right”, but “to deliver reliably”: make long-chain research converge into shippable artifacts in an observable, configurable, and extensible workflow.

10+key workflow nodes

Parallelparallel collection + parallel section writing

2primary output graphs (HTML / MD)

4+runtime modes

What problem does LINK-Researcher solve?¶

Research delivery is hard because inputs are messy, evidence is scattered, the chain is long, and output requirements are strict. This project breaks “long-chain uncertainty” into controllable stages:

Decomposition: break large questions into executable steps to avoid one-shot instability
Parallelism: parallelize collection and writing to improve throughput
Convergence: each stage has clear responsibility; validation and packaging close the loop
Observability: memory + queue events expose intermediate states for debugging and iteration

From input to deliverable: the main workflow¶

The core execution chain is defined by build_graph() in src/graph/base_graph.py:

START
  -> role_play
  -> planner
  -> (perception | page_replan)
  -> data_collection (parallel)
  -> init_design_guide
  -> init_format
  -> format (parallel)
  -> post_process
  -> validation
  -> zip_data
  -> END

You can think of it as four stages:

Task orientation: role_play / planner clarifies goals, roles, and steps.
Evidence building: perception/page_replan + data_collection turn inputs and external info into usable evidence.
Content generation: init_format + format transform evidence into structured sections.
Delivery hardening: post_process, validation, zip_data improve consistency and produce shippable artifacts.

Why this architecture “runs reliably”¶

Graph as Contract

Nodes define stage boundaries, reducing drift in long generation chains.

Parallel as Default

Subtasks are dispatched via Send() to shorten wall-clock latency.

Tools as Capability Layer

Search, fetch, and file ops are injected explicitly as tools, not implicitly coupled.

Memory as Continuity

Plans and intermediate results accumulate across stages to support convergence.

Mode as Trade-off

FAST_MODE and related toggles formalize speed-vs-quality trade-offs.

Validation as Guardrail

Post-processing and validation add consistency checks before packaging.