CodeTracer API Reference

Self-Evolving Agent Trajectory Diagnosis System

CodeTracer analyzes agent execution trajectories step-by-step, identifies incorrect and unuseful actions, and produces structured diagnostic labels. It operates as an autonomous diagnosis agent with cross-trajectory memory.

$ pip install codetracer

Installation

git clone https://github.com/NJU-LINK/CodeTracer.git
cd CodeTracer
pip install -e .

Configure your LLM endpoint:

export CODETRACER_API_BASE="https://api.openai.com/v1"
export CODETRACER_API_KEY="your-api-key"

Quick Start

from pathlib import Path
from codetracer.query.normalizer import Normalizer
from codetracer.query.tree_builder import TreeBuilder
from codetracer.skills.pool import SkillPool
from codetracer.agents.trace_agent import TraceAgent
from codetracer.agents.context import ContextAssembler
from codetracer.llm.client import LLMClient

# 1. Normalize trajectory
pool = SkillPool()
normalizer = Normalizer(pool)
skill = normalizer.detect(Path("path/to/trajectory"))
traj = normalizer.normalize(Path("path/to/trajectory"), skill)

# 2. Build navigation tree
tree_md = TreeBuilder().build(traj)

# 3. Run diagnosis
llm = LLMClient(api_base="https://api.openai.com/v1", api_key="...", model_name="gpt-4o")
assembler = ContextAssembler(config={}, skill_pool=pool)
agent = TraceAgent(llm, assembler, Path("./work"), Path("./labels.json"), config={})
result = agent.run(skill)

TraceAgent

codetracer.agents.trace_agent

class High-level trace agent that wires context assembly and the base agent loop for autonomous trajectory diagnosis.

Constructor

TraceAgent(
    llm: LLMClient,
    assembler: ContextAssembler,
    run_dir: Path,
    output_path: Path,
    config: dict[str, Any],
    artifacts_dir: Path | None = None,
    *,
    hooks: HookManager | None = None,
    cost_tracker: CostTracker | None = None,
    compact_manager: CompactManager | None = None,
    profile: OutputProfile | None = None,
    agent_type: str = ""
)

Methods

CompactManager

codetracer.agents.compact

class Two-tier context compaction: LLM summarization with sliding-window fallback. Never permanently disables.

Constructor

CompactManager(
    context_window: int = 128_000,
    buffer_tokens: int = 13_000,
    max_failures: int = 3,
    enabled: bool = True
)

Methods

Properties

ContextAssembler

codetracer.agents.context

class Composes LLM messages from config templates, skill docs, and run data.

Constructor

ContextAssembler(config: dict[str, Any], skill_pool: SkillPool)

Methods

Discovery Explorer

codetracer.discovery.explorer

Three-phase recursive trajectory discovery with LLM-guided fallback for arbitrarily nested directories.

discover_trajectory_dirs

function

discover_trajectory_dirs(
    root: Path,
    config: dict[str, Any] | None = None,
    llm: LLMClient | None = None
) → list[Path]

Discover trajectory directories under root using three-phase strategy:

  1. Marker scan — recursive walk for known markers (results.json, steps.json, etc.)
  2. Skill detection — validate candidates via SkillPool
  3. LLM analysis — when fast scan returns empty, use LLM to analyze directory structure

detect_or_generate_skill

function

detect_or_generate_skill(
    run_dir: Path,
    normalizer: Normalizer,
    pool: SkillPool,
    llm: LLMClient,
    config: dict[str, Any],
    user_skill_dir: Path | None = None,
    format_override: str | None = None
) → tuple[Skill | None, NormalizedTrajectory]

Unified detect-or-generate. Tries: pre-normalized → step-JSONL → skill detection → auto-generation.

SkillPool

codetracer.skills.pool

class Registry of all available trajectory parsers (seed + user-generated skills).

Constructor

SkillPool(seed_dir: Path = <built-in>, user_dir: Path | None = None)

Methods

SkillGenerator

codetracer.skills.generator

class Uses LLM to analyze unknown trajectory formats and auto-generate parsers.

Constructor

SkillGenerator(llm: LLMClient, pool: SkillPool, config: dict[str, Any])

Methods

Normalizer

codetracer.query.normalizer

class Orchestrates format detection and parsing into NormalizedTrajectory.

Constructor

Normalizer(pool: SkillPool)

Methods

TreeBuilder

codetracer.query.tree_builder

class Converts normalized trajectories into tree.md navigation indices with step classification.

Constructor

TreeBuilder(llm=None, config: dict[str, Any] | None = None)

Methods

Memory Service

codetracer.services.memory

Cross-trajectory memory with online mid-analysis extraction. Accumulates agent-specific failure patterns and investigation strategies.

OnlineMemoryExtractor

class Fire-and-forget background extraction during analysis.

OnlineMemoryExtractor(
    agent_type: str,
    llm: Any,
    memory_dir: Path | None = None,
    step_interval: int = 8,
    token_threshold: int = 30_000
)

Module Functions

CostTracker

codetracer.services.cost_tracker

dataclass Tracks LLM cost across pipeline phases and enforces budget limits.

Methods

Properties

ModelCosts

dataclass Per-million-token pricing.

FieldTypeDefault
input_per_mtokfloat3.0
output_per_mtokfloat15.0

LLMClient

codetracer.llm.client

class OpenAI-compatible LLM client with categorized retry logic and Azure AD support.

Constructor

LLMClient(
    api_base: str = "",
    api_key: str = "",
    model_name: str | None = None,
    model_kwargs: dict = {},
    azure_ad_resource: str = ""
)

Methods

Properties

Data Models

codetracer.models

NormalizedTrajectory

dataclass Fully normalized trajectory ready for tree building and tracing.

FieldTypeDescription
stepslist[StepRecord]List of steps
task_descriptionstrTask description
metadatadictFormat/run metadata

StepRecord

dataclass One normalized step: an action and its observation.

FieldTypeDescription
step_idintStep index
actionstrAction taken
observationstr | NoneObservation result
thinkingstr | NoneInternal reasoning
tool_typestr | NoneType of tool used
action_refFileRef | NoneSource location reference
observation_refFileRef | NoneSource location reference

ErrorAnalysis

dataclass Result of trajectory error analysis.

FieldTypeDescription
traj_idstrTrajectory identifier
labelslist[StepLabel]Per-step labels
summarystrAnalysis summary
metadatadictAdditional metadata

StepLabel

dataclass Label for a single diagnosed step.

FieldTypeDescription
step_idintTarget step
verdictStepVerdictINCORRECT | UNUSEFUL | CORRECT
reasoningstrWhy this verdict
deviation_typestrType of deviation
correct_alternativestrWhat should have happened

StepVerdict

enum INCORRECT = "incorrect" | UNUSEFUL = "unuseful" | CORRECT = "correct"

ReplayResult

dataclass Outcome of replay session.

FieldTypeDescription
statusReplayStatusSUCCESS | PARTIAL | FAILED
checkpointStepCheckpoint | NoneFinal checkpoint
steps_replayedintCount of steps replayed
agent_outputstrAgent's response

TaskContext

dataclass Task metadata and provider reference.

FieldTypeDescription
bench_typestrBenchmark type
task_namestrTask name
task_dirPathTask directory
problem_statementstr | NoneProblem description

Output Profiles

codetracer.state.output_profile

OutputProfile

dataclass

FieldTypeDescription
namestrProfile name
schema_refstrSchema reference
finalize_instructionstrAgent output format instructions
output_filestrOutput filename

Built-in Profiles

tracebenchcodetracer_labels.json
Stage-level labels with incorrect_step_ids and unuseful_step_ids for benchmark evaluation.

detailedcodetracer_analysis.json
Root cause chains, critical decision points, and comprehensive analysis.

rl_feedbackcodetracer_rl_feedback.json
Per-step deviation analysis and reward signals for RL training.

Functions

Plugin Adapters

codetracer.plugins.adapter

Integration surface for embedding CodeTracer into external agent frameworks.

PluginAdapter

abstract Base class for framework integration.

GenericPluginAdapter

class Full pipeline adapter backed by CodeTracer components.

GenericPluginAdapter(
    skill_name: str,
    *,
    bench_name: str | None = None,
    config: dict | None = None,
    llm_kwargs: dict | None = None
)

Built-in Adapters

AdapterSkill NameFramework
MinisweAdapterminisweMiniSWE Agent
OpenHandsAdapteropenhandsOpenHands
SweAgentAdapterswe_agentSWE-Agent

CLI Reference

codetracer.cli.commands
Usage: codetracer [COMMAND] [OPTIONS]

Commands:
  analyze     Run trajectory diagnosis on a run directory
  run         Full pipeline: detect → normalize → tree → analyze → replay
  replay      Resume trajectory from diagnosed breakpoint
  inspect     Inspect specific step or range
  interactive Enter interactive REPL with menu-driven actions
  normalize   Normalize a trajectory to steps.json format
  tree        Build step classification tree
  batch       Run batch analysis from manifest

Global Options:
  --model TEXT       LLM model name
  --api-base URL     API endpoint
  --api-key TEXT     API key
  --config PATH      Custom configuration file
  --profile TEXT     Output profile (tracebench / detailed / rl_feedback)
  --cost-limit $     Max LLM spend per trajectory (default: 3.0)
  --dry-run          Normalize + tree only, skip LLM analysis

CodeTracer © 2026 Nanjing UniversityPaperGitHub • MIT License