A newer version of this model is available: diffutron/DiffutronLM-0.3B-Instruct

DiffutronLM-0.3B-1st-Stage

DiffutronLM-0.3B-1st-Stage is an intermediate checkpoint of the Diffutron series, a parameter-efficient, Masked Diffusion Language Model (MDLM) designed for the Turkish language.

This specific model represents the completion of the first stage of instruction fine-tuning. It has been trained to grasp the fundamentals of instruction-following in Turkish, serving as a robust foundation before more complex, domain-specific specialization (which is handled in the final Instruct model).

πŸ“Œ Model Details

  • Model Type: Masked Diffusion Language Model (MDLM)
  • Base Architecture: jhu-clsp/mmBERT-base (Multilingual Encoder)
  • Language: Turkish
  • Parameter Count: 307M (0.3B)
  • Context Length: 256 tokens
  • Training Libraries: dllm, PyTorch
  • Status: Intermediate Checkpoint (Stage 1 SFT)

πŸš€ Training Pipeline for This Checkpoint

Diffutron replaces traditional next-token autoregressive generation with a discrete diffusion process, generating text by iteratively refining sequences in parallel. To reach this checkpoint, the model underwent two main phases:

1. Continual Pre-training (CPT)

The multilingual backbone was adapted to Turkish using a high-rank LoRA strategy (r=256, Ξ±=256) on ~2 million sequences sourced from Havadis, Temiz-OSCAR, and Turkish Wikipedia. This effectively modeled Turkish morphological nuances without catastrophic forgetting.

2. Stage 1: Foundational Instruction Tuning

Following CPT, the model underwent full supervised fine-tuning (SFT) to align it with human intent.

  • Dataset: metunlp/LlamaTurk-Instruction-Set
  • Objective: Introduce the model to a broad range of general instructions and establish basic response coherence.
  • Hyperparameters: 20 Epochs, Batch Size 16, AdamW optimizer (lr=1e-4), Max Sequence Length 256.

(Note: For the most advanced instruction-following capabilities, including complex reasoning, we recommend using the final DiffutronLM-0.3B-Instruct model, which includes a second stage of tuning on InstrucTurca.)

πŸ“Š Evaluation Results

Despite being an intermediate checkpoint, the 1st-Stage model demonstrates highly competitive performance against much larger autoregressive baselines on the CETVEL Benchmark Suite.

Benchmark Diffutron-1st (0.3B)-Stage Diffutron-2nd-Stage (0.3B) TURNA (1.1B) Kumru (2B) Kanarya (2B) Llama-3.2 (3B) Trendyol (7B) Aya-101 (13B)
Belebele_TR 22.22 27.00 22.56 29.00 28.11 55.78 36.22 22.89
EXAMS_TR 25.95 27.74 23.66 30.03 30.03 26.21 28.50 22.90
IronyTR 50.67 52.00 48.33 51.00 50.00 50.17 50.00 52.17
News_Cat 23.20 32.40 32.80 26.40 66.80 64.00 81.20 20.00
MNLI_TR 33.29 32.81 34.94 36.42 33.40 34.76 35.19 27.90
STS_TR 17.77 18.78 14.21 11.75 12.91 12.91 15.52 16.97
XCOPA_TR 53.80 52.00 55.80 54.00 64.20 54.60 61.00 59.60
Average 32.41 34.68 33.19 34.09 40.78 42.63 43.95 31.78

πŸ’» Usage

Because Diffutron is a Masked Diffusion Language Model, it requires inference strategies distinct from standard causal generation. We recommend using the dllm library or custom generation loops tailored for discrete diffusion.

1. Install the dllm Library:

git clone https://github.com/Diffutron/dllm.git
cd dllm
pip install -e .

2. Chat via Interaction Mode:

python -u examples/bert/chat.py \
    --model_name_or_path "diffutron/DiffutronLM-0.3B-1st-Stage" \
    --chat True \
    --steps 64 \
    --max_new_tokens 64 \
    --temperature 0.1 \
    --block_length 32 \
    --repetition_penalty 1.2 \
    --remasking "low_confidence" \
    --stochastic_transfer False \
    --cfg_scale 0.0

For other inference modes, see dllm library.

⚠️ Limitations

  • Intermediate State: This model has not undergone the final specialization phase and may struggle with highly complex or multi-turn instructions compared to the final Instruct model.
  • Context Window: Restricted to a 256-token context window.
  • Multilingual Backbone: Inherits representations from a multilingual encoder, not a natively trained Turkish foundation model.

πŸ“ Citation

@misc{diffutron2026,
  author = {Kocabay, Şuayp Talha and Akkuş, Talha Rüzgar},
  title = {Diffutron: A Masked Diffusion Language Model for Turkish Language},
  year = {2026},
  publisher = {Hugging Face},
  howpublished = {\url{[https://huggingface.co/collections/diffutron/diffutronlm](https://huggingface.co/collections/diffutron/diffutronlm)}}
}
Downloads last month
18
Safetensors
Model size
0.3B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for diffutron/DiffutronLM-0.3B-1st-Stage

Finetuned
(1)
this model
Finetunes
1 model

Dataset used to train diffutron/DiffutronLM-0.3B-1st-Stage

Collection including diffutron/DiffutronLM-0.3B-1st-Stage