CodeTracer API Reference

Installation

git clone https://github.com/NJU-LINK/CodeTracer.git
cd CodeTracer
pip install -e .

Configure your LLM endpoint:

export CODETRACER_API_BASE="https://api.openai.com/v1"
export CODETRACER_API_KEY="your-api-key"

Quick Start

from pathlib import Path
from codetracer.query.normalizer import Normalizer
from codetracer.query.tree_builder import TreeBuilder
from codetracer.skills.pool import SkillPool
from codetracer.agents.trace_agent import TraceAgent
from codetracer.agents.context import ContextAssembler
from codetracer.llm.client import LLMClient

# 1. Normalize trajectory
pool = SkillPool()
normalizer = Normalizer(pool)
skill = normalizer.detect(Path("path/to/trajectory"))
traj = normalizer.normalize(Path("path/to/trajectory"), skill)

# 2. Build navigation tree
tree_md = TreeBuilder().build(traj)

# 3. Run diagnosis
llm = LLMClient(api_base="https://api.openai.com/v1", api_key="...", model_name="gpt-4o")
assembler = ContextAssembler(config={}, skill_pool=pool)
agent = TraceAgent(llm, assembler, Path("./work"), Path("./labels.json"), config={})
result = agent.run(skill)

TraceAgent

codetracer.agents.trace_agent

class High-level trace agent that wires context assembly and the base agent loop for autonomous trajectory diagnosis.

Constructor

TraceAgent(
    llm: LLMClient,
    assembler: ContextAssembler,
    run_dir: Path,
    output_path: Path,
    config: dict[str, Any],
    artifacts_dir: Path | None = None,
    *,
    hooks: HookManager | None = None,
    cost_tracker: CostTracker | None = None,
    compact_manager: CompactManager | None = None,
    profile: OutputProfile | None = None,
    agent_type: str = ""
)

Methods

run(skill, task_ctx=None, memory_text="", budget_context="", traj_metadata=None) → str Run full analysis and return result summary.
run_iter(skill, task_ctx=None, memory_text="", budget_context="", traj_metadata=None) Generator variant that yields AgentEvent objects for streaming.
save_trajectory(path: Path) → None Save agent conversation trajectory to JSON file.

CompactManager

codetracer.agents.compact

class Two-tier context compaction: LLM summarization with sliding-window fallback. Never permanently disables.

Constructor

CompactManager(
    context_window: int = 128_000,
    buffer_tokens: int = 13_000,
    max_failures: int = 3,
    enabled: bool = True
)

Methods

should_compact(messages: list[dict]) → bool Check if messages exceed the token threshold.
compact(messages: list[dict], llm) → list[dict] Summarize messages and return shorter replacement list.

Properties

threshold: int Context window threshold in tokens.
compact_count: int Number of compaction passes applied.

ContextAssembler

codetracer.agents.context

class Composes LLM messages from config templates, skill docs, and run data.

Constructor

ContextAssembler(config: dict[str, Any], skill_pool: SkillPool)

Methods

build_trace_messages(run_dir, skill, task_ctx=None, ...) → list[dict] Build [system, user] messages for trace agent using layered composition.
build_discovery_messages(run_dir, listing, samples) → list[dict] Build messages for skill generator agent.

Discovery Explorer

codetracer.discovery.explorer

Three-phase recursive trajectory discovery with LLM-guided fallback for arbitrarily nested directories.

discover_trajectory_dirs

function

discover_trajectory_dirs(
    root: Path,
    config: dict[str, Any] | None = None,
    llm: LLMClient | None = None
) → list[Path]

Discover trajectory directories under root using three-phase strategy:

Marker scan — recursive walk for known markers (results.json, steps.json, etc.)
Skill detection — validate candidates via SkillPool
LLM analysis — when fast scan returns empty, use LLM to analyze directory structure

detect_or_generate_skill

function

detect_or_generate_skill(
    run_dir: Path,
    normalizer: Normalizer,
    pool: SkillPool,
    llm: LLMClient,
    config: dict[str, Any],
    user_skill_dir: Path | None = None,
    format_override: str | None = None
) → tuple[Skill | None, NormalizedTrajectory]

Unified detect-or-generate. Tries: pre-normalized → step-JSONL → skill detection → auto-generation.

SkillPool

codetracer.skills.pool

class Registry of all available trajectory parsers (seed + user-generated skills).

Constructor

SkillPool(seed_dir: Path = <built-in>, user_dir: Path | None = None)

Methods

detect(run_dir: Path) → str | None Return name of matching skill via two-pass detection.
get(name: str) → Skill | None Get skill by name.
register(skill: Skill) → None Register a new skill.
list_skills() → list[Skill] Get all registered skills.
skill_index() → str Compact markdown index for LLM context injection.

SkillGenerator

codetracer.skills.generator

class Uses LLM to analyze unknown trajectory formats and auto-generate parsers.

Constructor

SkillGenerator(llm: LLMClient, pool: SkillPool, config: dict[str, Any])

Methods

generate(run_dir: Path, user_dir: Path) → Skill Analyze run_dir, generate SKILL.md + parser.py, register and return. Raises RuntimeError after max_attempts.

Normalizer

codetracer.query.normalizer

class Orchestrates format detection and parsing into NormalizedTrajectory.

Constructor

Normalizer(pool: SkillPool)

Methods

is_pre_normalized(run_dir: Path) → bool True if run_dir contains steps.json.
is_step_jsonl_dir(run_dir: Path) → bool True if run_dir contains step_N.jsonl files.
detect(run_dir: Path, format_override: str | None = None) → Skill Return matching skill or raise ValueError.
normalize_pre_normalized(run_dir, output_dir=None, quiet=False) → NormalizedTrajectory Load pre-normalized directory.
normalize_step_jsonl(run_dir, output_dir=None, quiet=False) → NormalizedTrajectory Load step_N.jsonl annotation files.
normalize(run_dir, skill, output_dir=None, quiet=False) → NormalizedTrajectory Parse using skill and write derived artifacts.

TreeBuilder

codetracer.query.tree_builder

class Converts normalized trajectories into tree.md navigation indices with step classification.

Constructor

TreeBuilder(llm=None, config: dict[str, Any] | None = None)

Methods

build(traj: NormalizedTrajectory) → str Build tree from step classification (fast, no LLM).
build_with_llm(traj: NormalizedTrajectory) → str Build tree using LLM for richer classification labels.
build_from_annotation(traj, annotation, run_dir=None) → str Build tree from per-step annotation labels.

Memory Service

codetracer.services.memory

Cross-trajectory memory with online mid-analysis extraction. Accumulates agent-specific failure patterns and investigation strategies.

OnlineMemoryExtractor

class Fire-and-forget background extraction during analysis.

OnlineMemoryExtractor(
    agent_type: str,
    llm: Any,
    memory_dir: Path | None = None,
    step_interval: int = 8,
    token_threshold: int = 30_000
)

should_extract(step: int, total_tokens: int) → bool Check whether extraction should trigger.
extract_async(messages, step, total_tokens) → None Launch extraction in background thread.

Module Functions

load_memory(agent_type: str, memory_dir: Path | None = None) → str Load TRACER.md memory file. Returns contents or empty string.
update_memory(agent_type, analysis_summary, failure_patterns=None, memory_dir=None) → Path Append insights to TRACER.md with timestamp.
auto_extract_memory(agent_type, labels_path, analysis_summary="", memory_dir=None) → Path | None One-shot post-analysis memory extraction.
extract_failure_patterns(labels: list[dict]) → list[str] Extract short failure pattern strings from labels.

CostTracker

codetracer.services.cost_tracker

dataclass Tracks LLM cost across pipeline phases and enforces budget limits.

Methods

add_usage(model, input_tokens, output_tokens, phase="trace", duration_s=0.0) → float Record usage and return incremental USD cost.
is_over_budget() → bool Check if total cost ≥ budget limit.
should_warn() → bool Check if warning threshold reached (80%).
get_phase_costs() → dict[str, PhaseCost] Get per-phase cost breakdown.
format_summary() → str Human-readable cost summary.

Properties

total_cost: float
budget_remaining: float
budget_used_pct: float

ModelCosts

dataclass Per-million-token pricing.

Field	Type	Default
input_per_mtok	float	3.0
output_per_mtok	float	15.0

LLMClient

codetracer.llm.client

class OpenAI-compatible LLM client with categorized retry logic and Azure AD support.

Constructor

LLMClient(
    api_base: str = "",
    api_key: str = "",
    model_name: str | None = None,
    model_kwargs: dict = {},
    azure_ad_resource: str = ""
)

Methods

query(messages: list[dict], **kwargs) → dict[str, Any] Query LLM with retry. Returns {"content": str, "usage": {...}}.

Properties

model_name: str | None
cost: float Accumulated cost (deprecated, use CostTracker).
n_calls: int
total_prompt_tokens: int
total_completion_tokens: int

Data Models

codetracer.models

NormalizedTrajectory

dataclass Fully normalized trajectory ready for tree building and tracing.

Field	Type	Description
steps	list[StepRecord]	List of steps
task_description	str	Task description
metadata	dict	Format/run metadata

write_steps_json(path: Path) → None
step_count: int (property)

StepRecord

dataclass One normalized step: an action and its observation.

Field	Type	Description
step_id	int	Step index
action	str	Action taken
observation	str \| None	Observation result
thinking	str \| None	Internal reasoning
tool_type	str \| None	Type of tool used
action_ref	FileRef \| None	Source location reference
observation_ref	FileRef \| None	Source location reference

ErrorAnalysis

dataclass Result of trajectory error analysis.

Field	Type	Description
traj_id	str	Trajectory identifier
labels	list[StepLabel]	Per-step labels
summary	str	Analysis summary
metadata	dict	Additional metadata

save(path: Path) → None
load(path: Path) → ErrorAnalysis (classmethod)
from_labels_json(path: Path, traj_id: str) → ErrorAnalysis (classmethod)
first_incorrect_step_id: int | None (property)

StepLabel

dataclass Label for a single diagnosed step.

Field	Type	Description
step_id	int	Target step
verdict	StepVerdict	INCORRECT \| UNUSEFUL \| CORRECT
reasoning	str	Why this verdict
deviation_type	str	Type of deviation
correct_alternative	str	What should have happened

StepVerdict

enum INCORRECT = "incorrect" | UNUSEFUL = "unuseful" | CORRECT = "correct"

ReplayResult

dataclass Outcome of replay session.

Field	Type	Description
status	ReplayStatus	SUCCESS \| PARTIAL \| FAILED
checkpoint	StepCheckpoint \| None	Final checkpoint
steps_replayed	int	Count of steps replayed
agent_output	str	Agent's response

TaskContext

dataclass Task metadata and provider reference.

Field	Type	Description
bench_type	str	Benchmark type
task_name	str	Task name
task_dir	Path	Task directory
problem_statement	str \| None	Problem description

load(task_dir: Path, pool=None) → TaskContext (classmethod) Auto-detect and create context.
prepare_sandbox(target_parent: Path) → Path
exploration_instructions(sandbox: Path) → str

Output Profiles

codetracer.state.output_profile

OutputProfile

dataclass

Field	Type	Description
name	str	Profile name
schema_ref	str	Schema reference
finalize_instruction	str	Agent output format instructions
output_file	str	Output filename

Built-in Profiles

tracebench → codetracer_labels.json
Stage-level labels with incorrect_step_ids and unuseful_step_ids for benchmark evaluation.

detailed → codetracer_analysis.json
Root cause chains, critical decision points, and comprehensive analysis.

rl_feedback → codetracer_rl_feedback.json
Per-step deviation analysis and reward signals for RL training.

Functions

load_profile(name: str, config=None) → OutputProfile Load by name. Raises ValueError if unknown.
get_default_profile_name(config=None) → str Returns default profile name ("detailed").

Plugin Adapters

codetracer.plugins.adapter

Integration surface for embedding CodeTracer into external agent frameworks.

PluginAdapter

abstract Base class for framework integration.

name() → str Unique identifier (e.g. 'openhands').
ingest_trajectory(raw_path, **kwargs) → NormalizedTrajectory
analyze(traj, **kwargs) → ErrorAnalysis
replay(traj, step_id, analysis, **kwargs) → ReplayResult
analyze_and_replay(raw_path, **kwargs) → ReplayResult Convenience: ingest → analyze → auto-replay.

GenericPluginAdapter

class Full pipeline adapter backed by CodeTracer components.

GenericPluginAdapter(
    skill_name: str,
    *,
    bench_name: str | None = None,
    config: dict | None = None,
    llm_kwargs: dict | None = None
)

Built-in Adapters

Adapter	Skill Name	Framework
MinisweAdapter	miniswe	MiniSWE Agent
OpenHandsAdapter	openhands	OpenHands
SweAgentAdapter	swe_agent	SWE-Agent

CLI Reference

codetracer.cli.commands

Usage: codetracer [COMMAND] [OPTIONS]

Commands:
  analyze     Run trajectory diagnosis on a run directory
  run         Full pipeline: detect → normalize → tree → analyze → replay
  replay      Resume trajectory from diagnosed breakpoint
  inspect     Inspect specific step or range
  interactive Enter interactive REPL with menu-driven actions
  normalize   Normalize a trajectory to steps.json format
  tree        Build step classification tree
  batch       Run batch analysis from manifest

Global Options:
  --model TEXT       LLM model name
  --api-base URL     API endpoint
  --api-key TEXT     API key
  --config PATH      Custom configuration file
  --profile TEXT     Output profile (tracebench / detailed / rl_feedback)
  --cost-limit $     Max LLM spend per trajectory (default: 3.0)
  --dry-run          Normalize + tree only, skip LLM analysis