Blocks
rl
Online RL from the SFT checkpoint
rl is the online reinforcement learning stage. Starting from the actor trainer produced, it runs GRPO/GSPO against SWE-bench tasks via Harbor + vLLM + verl, on a multi-GPU node with Kubernetes-backed task execution.
Full docs: swe-rl-docs.pages.dev/docs
Long-form docs live there
The rl block ships its own complete documentation site — getting started, core concepts, running training (preflight, the vLLM + LiteLLM inference stack, sandbox backends, results, and scaling), the dashboard, a troubleshooting guide, and the full reference. This page is just an orientation; follow the link above for everything else.
At a glance
- Inputs:
model(typically wired fromtrainer.output.checkpoint_path),infrastructure,training,data,experiment,credentials. - Output: actor checkpoints under
repos/harbor-verl-train/outputs/. - Runs: Local (8× GPU + K8s). Long-running.
How to run
/rl:setup # build venv, sync submodules, apply verl patch
/rl:check # preflight: GPUs, K8s reachability, vLLM env, base model
/rl:run # launch training (long-running)
/rl:dashboard # launch the trajectory + metric webuiReference
- Block contract:
subblock/rl/CLAUDE.md - Full docs site: swe-rl-docs.pages.dev/docs