dUltra: Ultra-Fast Diffusion Language Models via Reinforcement Learning

dUltra is an on-policy reinforcement learning framework based on Group Relative Policy Optimization (GRPO) that learns unmasking strategies for efficient parallel decoding in masked diffusion language models (MDLMs). By jointly optimizing the base diffusion LLM and an unmasking order planner, dUltra achieves superior accuracy-efficiency trade-offs on mathematical reasoning and code generation tasks.

Usage

To use this model, you can load it through the transformers library. Note that it requires trust_remote_code=True to load the custom model architecture.

from model.llada.lladou import LLaDOUModelLM
from transformers import AutoTokenizer
import torch

model = LLaDOUModelLM.from_pretrained(
            "sengi/dUltra-math",
            trust_remote_code=True,
            torch_dtype=torch.bfloat16,
        )
tokenizer = AutoTokenizer.from_pretrained("sengi/dUltra-math")

Citation

@misc{chen2025dultraultrafastdiffusionlanguage,
      title={dUltra: Ultra-Fast Diffusion Language Models via Reinforcement Learning},
      author={Shirui Chen and Jiantao Jiao and Lillian J. Ratliff and Banghua Zhu},
      year={2025},
      eprint={2512.21446},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2512.21446},
}
Downloads last month
13
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for sengi/dUltra-math-b128