YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Nexus-Coder-Alpha

A practical training guide and recipe for building state-of-the-art agentic coding assistants with open-source 8B parameter models.

What This Is

This repository consolidates research from Nemotron-Terminal, Klear-AgentForge, GLM-5, and Qwen3-Coder-Next into a single reproducible training pipeline:

  1. Supervised Fine-Tuning (SFT) on high-quality multi-turn agent trajectories
  2. Reinforcement Learning (RL) with execution-verified rewards
  3. Deployment in Pi agent, Cline, OpenCode, or any OpenAI-compatible coding tool

Target Model

Base: nvidia/Nemotron-Terminal-8B

  • 8.2B parameters, Qwen3 architecture, native tool_calls support
  • Already pre-trained for terminal/code-agent interaction
  • Fits on single A100 or A10g-large with LoRA

Key Results (from cited papers)

Benchmark 8B Target SOTA Reference
SWE-bench Verified 20-40% Klear-AgentForge: 39.4%
BFCL v3 65-75% Klear-AgentForge: 71.5%
Terminal-Bench 2.0 15-25% Nemotron-T-14B: 20.2%
Aider-Polyglot 25-40% Klear-AgentForge: 33.8%

Documents

  • TRAINING_GUIDE.md β€” Full SFT β†’ RL β†’ Deployment recipe with code snippets, dataset links, hyperparameters, and SOTA tricks
  • train_sft.py β€” Reference training script for Stage 1 (SFT)
  • train_grpo.py β€” Reference training script for Stage 2 (GRPO RL)

Quick Start

# Stage 1: SFT on curated agent trajectories
python train_sft.py \
  --model nvidia/Nemotron-Terminal-8B \
  --dataset mixed_agentic_dataset \
  --output_dir ./nexus-coder-sft

# Stage 2: GRPO with execution-verified rewards
python train_grpo.py \
  --model ./nexus-coder-sft \
  --dataset nvidia/Nemotron-RL-Agentic-SWE-Pivot-v1 \
  --output_dir ./nexus-coder-rl

Core Datasets

Dataset Split Purpose Link
SWE-bench/SWE-smith-trajectories tool (resolved=True) SFT: Real repo bug fixing HF
nvidia/Nemotron-Agentic-v1 interactive_agent + tool_calling SFT: Multi-turn tool use HF
xingyaoww/code-act codeact + general SFT: Executable code actions HF
nvidia/Nemotron-RL-Agentic-SWE-Pivot-v1 train RL: Step-level pass-rate rewards HF

Top SOTA Tricks

  1. Multi-format tool templates β€” Train on 4-5 schemas (OpenAI JSON, XML, Python-style, TypeScript, Qwen3-native) so the model generalizes to any agent framework.
  2. Token-in-Token-Out (TITO) β€” Use raw token IDs from vLLM rollouts; never re-tokenize for RL loss computation.
  3. Async RL β€” Decouple vLLM inference engine from training loop for 2-3x throughput.
  4. Format-aware regularization β€” Penalize malformed tool calls even if the action is logically correct.
  5. 60/30/10 data mix β€” SWE trajectories / general tool-use / code-as-action by token volume.

Benchmarks

  • SWE-bench Verified β€” Primary real-world software engineering benchmark
  • Terminal-Bench 2.0 β€” Terminal/agent task completion
  • BFCL v3 β€” Multi-turn function calling
  • Aider-Polyglot β€” Multi-language code editing
  • tau-bench β€” Long-horizon multi-turn tool use

Citation

If you use this recipe, please cite the underlying research:

@article{nemotron-terminal-2026,
  title={Nemotron-Terminal: Scalable Training for Terminal-Capable Language Models},
  author={NVIDIA},
  journal={arXiv:2602.21193},
  year={2026}
}
@article{klear-agentforge-2025,
  title={Klear-AgentForge: Forging Agentic Intelligence through Posttraining Scaling},
  author={Klear-AI},
  journal={arXiv:2511.05951},
  year={2025}
}
@article{glm5-2026,
  title={GLM-5: from Vibe Coding to Agentic Engineering},
  author={Zhipu AI},
  journal={arXiv:2602.15763},
  year={2026}
}

License

The training guide and scripts are provided as-is for research and educational purposes. Dataset and base model licenses apply to their respective owners.

Downloads last month
44
Safetensors
Model size
0.5B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Papers for olanigan/nexus-coder-alpha