trainer

trainer is the supervised fine-tuning stage. It reads LLaMA-Factory LF-format SFT data produced by tracer, then trains a base model with LLaMA-Factory + DeepSpeed ZeRO-3 across 8 GPUs.

Full docs: swe-trainer-docs.pages.dev/docs

Long-form docs live there

The trainer block ships its own complete documentation site — getting started, core concepts, the trajectory→ShareGPT data pipeline (scaffolds, scoring, evaluator-leak filtering), training (configuration, results, and artifacts), the live dashboard, and the full reference. This page is just an orientation; follow the link above for everything else.

At a glance

Inputs: source, conversion, model, training, infrastructure, credentials. The training data dependency is typically wired from tracer.output.sft_data_dir.
Output: checkpoint_path — subblock/trainer/artifacts/model/<run>/ (consumed by rl as the starting actor).
Runs: Local (8× GPU). Long-running.

How to run

/trainer:setup        # install LLaMA-Factory, register dataset, fill defaults
/trainer:check        # preflight: GPUs, DeepSpeed config, dataset, base model
/trainer:run          # launch training
/trainer:dashboard    # parse the latest training log + WandB URL

Reference

Block contract: subblock/trainer/CLAUDE.md
Full docs site: swe-trainer-docs.pages.dev/docs

trainer

At a glance

How to run

Reference

On this page