Instructions to use diffutron/DiffutronLM-0.3B-1st-Stage with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use diffutron/DiffutronLM-0.3B-1st-Stage with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="diffutron/DiffutronLM-0.3B-1st-Stage")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMaskedLM

tokenizer = AutoTokenizer.from_pretrained("diffutron/DiffutronLM-0.3B-1st-Stage")
model = AutoModelForMaskedLM.from_pretrained("diffutron/DiffutronLM-0.3B-1st-Stage")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use diffutron/DiffutronLM-0.3B-1st-Stage with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "diffutron/DiffutronLM-0.3B-1st-Stage"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "diffutron/DiffutronLM-0.3B-1st-Stage",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/diffutron/DiffutronLM-0.3B-1st-Stage

SGLang

How to use diffutron/DiffutronLM-0.3B-1st-Stage with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "diffutron/DiffutronLM-0.3B-1st-Stage" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "diffutron/DiffutronLM-0.3B-1st-Stage",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "diffutron/DiffutronLM-0.3B-1st-Stage" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "diffutron/DiffutronLM-0.3B-1st-Stage",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use diffutron/DiffutronLM-0.3B-1st-Stage with Docker Model Runner:
```
docker model run hf.co/diffutron/DiffutronLM-0.3B-1st-Stage
```

A newer version of this model is available: diffutron/DiffutronLM-0.3B-Instruct

DiffutronLM-0.3B-1st-Stage

DiffutronLM-0.3B-1st-Stage is an intermediate checkpoint of the Diffutron series, a parameter-efficient, Masked Diffusion Language Model (MDLM) designed for the Turkish language.

This specific model represents the completion of the first stage of instruction fine-tuning. It has been trained to grasp the fundamentals of instruction-following in Turkish, serving as a robust foundation before more complex, domain-specific specialization (which is handled in the final Instruct model).

📌 Model Details

Model Type: Masked Diffusion Language Model (MDLM)
Base Architecture: jhu-clsp/mmBERT-base (Multilingual Encoder)
Language: Turkish
Parameter Count: 307M (0.3B)
Context Length: 256 tokens
Training Libraries: dllm, PyTorch
Status: Intermediate Checkpoint (Stage 1 SFT)

🚀 Training Pipeline for This Checkpoint

Diffutron replaces traditional next-token autoregressive generation with a discrete diffusion process, generating text by iteratively refining sequences in parallel. To reach this checkpoint, the model underwent two main phases:

1. Continual Pre-training (CPT)

The multilingual backbone was adapted to Turkish using a high-rank LoRA strategy (r=256, α=256) on ~2 million sequences sourced from Havadis, Temiz-OSCAR, and Turkish Wikipedia. This effectively modeled Turkish morphological nuances without catastrophic forgetting.

2. Stage 1: Foundational Instruction Tuning

Following CPT, the model underwent full supervised fine-tuning (SFT) to align it with human intent.

Dataset: metunlp/LlamaTurk-Instruction-Set
Objective: Introduce the model to a broad range of general instructions and establish basic response coherence.
Hyperparameters: 20 Epochs, Batch Size 16, AdamW optimizer (lr=1e-4), Max Sequence Length 256.

(Note: For the most advanced instruction-following capabilities, including complex reasoning, we recommend using the final DiffutronLM-0.3B-Instruct model, which includes a second stage of tuning on InstrucTurca.)

📊 Evaluation Results

Despite being an intermediate checkpoint, the 1st-Stage model demonstrates highly competitive performance against much larger autoregressive baselines on the CETVEL Benchmark Suite.

Benchmark	Diffutron-1st (0.3B)-Stage	Diffutron-2nd-Stage (0.3B)	TURNA (1.1B)	Kumru (2B)	Kanarya (2B)	Llama-3.2 (3B)	Trendyol (7B)	Aya-101 (13B)
Belebele_TR	22.22	27.00	22.56	29.00	28.11	55.78	36.22	22.89
EXAMS_TR	25.95	27.74	23.66	30.03	30.03	26.21	28.50	22.90
IronyTR	50.67	52.00	48.33	51.00	50.00	50.17	50.00	52.17
News_Cat	23.20	32.40	32.80	26.40	66.80	64.00	81.20	20.00
MNLI_TR	33.29	32.81	34.94	36.42	33.40	34.76	35.19	27.90
STS_TR	17.77	18.78	14.21	11.75	12.91	12.91	15.52	16.97
XCOPA_TR	53.80	52.00	55.80	54.00	64.20	54.60	61.00	59.60
Average	32.41	34.68	33.19	34.09	40.78	42.63	43.95	31.78

💻 Usage

Because Diffutron is a Masked Diffusion Language Model, it requires inference strategies distinct from standard causal generation. We recommend using the dllm library or custom generation loops tailored for discrete diffusion.

1. Install the dllm Library:

git clone https://github.com/Diffutron/dllm.git
cd dllm
pip install -e .

2. Chat via Interaction Mode:

python -u examples/bert/chat.py \
    --model_name_or_path "diffutron/DiffutronLM-0.3B-1st-Stage" \
    --chat True \
    --steps 64 \
    --max_new_tokens 64 \
    --temperature 0.1 \
    --block_length 32 \
    --repetition_penalty 1.2 \
    --remasking "low_confidence" \
    --stochastic_transfer False \
    --cfg_scale 0.0

For other inference modes, see dllm library.

⚠️ Limitations

Intermediate State: This model has not undergone the final specialization phase and may struggle with highly complex or multi-turn instructions compared to the final Instruct model.
Context Window: Restricted to a 256-token context window.
Multilingual Backbone: Inherits representations from a multilingual encoder, not a natively trained Turkish foundation model.

📝 Citation

@misc{diffutron2026,
      title={Diffutron: A Masked Diffusion Language Model for Turkish Language}, 
      author={Şuayp Talha Kocabay and Talha Rüzgar Akkuş},
      year={2026},
      eprint={2603.20466},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2603.20466}, 
}