DiffutronLM-0.3B-1st-Stage
DiffutronLM-0.3B-1st-Stage is an intermediate checkpoint of the Diffutron series, a parameter-efficient, Masked Diffusion Language Model (MDLM) designed for the Turkish language.
This specific model represents the completion of the first stage of instruction fine-tuning. It has been trained to grasp the fundamentals of instruction-following in Turkish, serving as a robust foundation before more complex, domain-specific specialization (which is handled in the final Instruct model).
π Model Details
- Model Type: Masked Diffusion Language Model (MDLM)
- Base Architecture:
jhu-clsp/mmBERT-base(Multilingual Encoder) - Language: Turkish
- Parameter Count: 307M (0.3B)
- Context Length: 256 tokens
- Training Libraries:
dllm, PyTorch - Status: Intermediate Checkpoint (Stage 1 SFT)
π Training Pipeline for This Checkpoint
Diffutron replaces traditional next-token autoregressive generation with a discrete diffusion process, generating text by iteratively refining sequences in parallel. To reach this checkpoint, the model underwent two main phases:
1. Continual Pre-training (CPT)
The multilingual backbone was adapted to Turkish using a high-rank LoRA strategy (r=256, Ξ±=256) on ~2 million sequences sourced from Havadis, Temiz-OSCAR, and Turkish Wikipedia. This effectively modeled Turkish morphological nuances without catastrophic forgetting.
2. Stage 1: Foundational Instruction Tuning
Following CPT, the model underwent full supervised fine-tuning (SFT) to align it with human intent.
- Dataset:
metunlp/LlamaTurk-Instruction-Set - Objective: Introduce the model to a broad range of general instructions and establish basic response coherence.
- Hyperparameters: 20 Epochs, Batch Size 16, AdamW optimizer (lr=1e-4), Max Sequence Length 256.
(Note: For the most advanced instruction-following capabilities, including complex reasoning, we recommend using the final DiffutronLM-0.3B-Instruct model, which includes a second stage of tuning on InstrucTurca.)
π Evaluation Results
Despite being an intermediate checkpoint, the 1st-Stage model demonstrates highly competitive performance against much larger autoregressive baselines on the CETVEL Benchmark Suite.
| Benchmark | Diffutron-1st (0.3B)-Stage | Diffutron-2nd-Stage (0.3B) | TURNA (1.1B) | Kumru (2B) | Kanarya (2B) | Llama-3.2 (3B) | Trendyol (7B) | Aya-101 (13B) |
|---|---|---|---|---|---|---|---|---|
| Belebele_TR | 22.22 | 27.00 | 22.56 | 29.00 | 28.11 | 55.78 | 36.22 | 22.89 |
| EXAMS_TR | 25.95 | 27.74 | 23.66 | 30.03 | 30.03 | 26.21 | 28.50 | 22.90 |
| IronyTR | 50.67 | 52.00 | 48.33 | 51.00 | 50.00 | 50.17 | 50.00 | 52.17 |
| News_Cat | 23.20 | 32.40 | 32.80 | 26.40 | 66.80 | 64.00 | 81.20 | 20.00 |
| MNLI_TR | 33.29 | 32.81 | 34.94 | 36.42 | 33.40 | 34.76 | 35.19 | 27.90 |
| STS_TR | 17.77 | 18.78 | 14.21 | 11.75 | 12.91 | 12.91 | 15.52 | 16.97 |
| XCOPA_TR | 53.80 | 52.00 | 55.80 | 54.00 | 64.20 | 54.60 | 61.00 | 59.60 |
| Average | 32.41 | 34.68 | 33.19 | 34.09 | 40.78 | 42.63 | 43.95 | 31.78 |
π» Usage
Because Diffutron is a Masked Diffusion Language Model, it requires inference strategies distinct from standard causal generation. We recommend using the dllm library or custom generation loops tailored for discrete diffusion.
1. Install the dllm Library:
git clone https://github.com/Diffutron/dllm.git
cd dllm
pip install -e .
2. Chat via Interaction Mode:
python -u examples/bert/chat.py \
--model_name_or_path "diffutron/DiffutronLM-0.3B-1st-Stage" \
--chat True \
--steps 64 \
--max_new_tokens 64 \
--temperature 0.1 \
--block_length 32 \
--repetition_penalty 1.2 \
--remasking "low_confidence" \
--stochastic_transfer False \
--cfg_scale 0.0
For other inference modes, see dllm library.
β οΈ Limitations
- Intermediate State: This model has not undergone the final specialization phase and may struggle with highly complex or multi-turn instructions compared to the final Instruct model.
- Context Window: Restricted to a 256-token context window.
- Multilingual Backbone: Inherits representations from a multilingual encoder, not a natively trained Turkish foundation model.
π Citation
@misc{diffutron2026,
author = {Kocabay, Εuayp Talha and AkkuΕ, Talha RΓΌzgar},
title = {Diffutron: A Masked Diffusion Language Model for Turkish Language},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{[https://huggingface.co/collections/diffutron/diffutronlm](https://huggingface.co/collections/diffutron/diffutronlm)}}
}
- Downloads last month
- 18