Skip to content

Overview

LINK-Researcher is a LangGraph-powered multi-agent workflow designed by Nanjing University and Alibaba Group for shipping complex research tasks end-to-end. Instead of a single-turn Q&A bot, it understands the task, produces a plan, collects evidence in parallel, writes sections in parallel, then post-processes, validates, and packages deliverables.

At an engineering level, it separates responsibilities across nodes so you can quickly diagnose where quality issues occur: planning, retrieval, generation, or validation.

PROJECT POSITIONING

Multi-Agent Research Workflow

The goal is not “to sound right”, but “to deliver reliably”: make long-chain research converge into shippable artifacts in an observable, configurable, and extensible workflow.

10+key workflow nodes

Parallelparallel collection + parallel section writing
2primary output graphs (HTML / MD)
4+runtime modes

Research delivery is hard because inputs are messy, evidence is scattered, the chain is long, and output requirements are strict. This project breaks “long-chain uncertainty” into controllable stages:

  • Decomposition: break large questions into executable steps to avoid one-shot instability
  • Parallelism: parallelize collection and writing to improve throughput
  • Convergence: each stage has clear responsibility; validation and packaging close the loop
  • Observability: memory + queue events expose intermediate states for debugging and iteration

From input to deliverable: the main workflow

The core execution chain is defined by build_graph() in src/graph/base_graph.py:

START
  -> role_play
  -> planner
  -> (perception | page_replan)
  -> data_collection (parallel)
  -> init_design_guide
  -> init_format
  -> format (parallel)
  -> post_process
  -> validation
  -> zip_data
  -> END

You can think of it as four stages:

  1. Task orientation: role_play / planner clarifies goals, roles, and steps.
  2. Evidence building: perception/page_replan + data_collection turn inputs and external info into usable evidence.
  3. Content generation: init_format + format transform evidence into structured sections.
  4. Delivery hardening: post_process, validation, zip_data improve consistency and produce shippable artifacts.

Why this architecture “runs reliably”

Graph as Contract

Nodes define stage boundaries, reducing drift in long generation chains.

Parallel as Default

Subtasks are dispatched via Send() to shorten wall-clock latency.

Tools as Capability Layer

Search, fetch, and file ops are injected explicitly as tools, not implicitly coupled.

Memory as Continuity

Plans and intermediate results accumulate across stages to support convergence.

Mode as Trade-off

FAST_MODE and related toggles formalize speed-vs-quality trade-offs.

Validation as Guardrail

Post-processing and validation add consistency checks before packaging.

  • Overview: build a mental model and the key trade-offs
  • Quick Start: run an end-to-end workflow with minimal setup
  • Architecture: dive into modules, topology, and sequence diagrams
  • Reference: references and upstream docs