ThaiSafetyClassifier

A binary classifier that predicts whether an LLM response to a given prompt is safe or harmful for Thai language and culture. Built by fine-tuning DeBERTaV3-base with LoRA for parameter-efficient training.

Model Details

  • Model type: Text classification (binary)
  • Base model: microsoft/deberta-v3-base
  • Fine-tuning method: LoRA (Low-Rank Adaptation)
  • Language: Thai
  • Labels: 0 → safe, 1 → harmful

Input Format

The model takes a prompt–response pair concatenated as:

input: <prompt> output: <llm_response>

Tokenized with the DeBERTa tokenizer at a maximum sequence length of 256.

Training Details

LoRA Configuration

Parameter Value
lora_r 8
lora_alpha 16
lora_dropout 0.1

Hyperparameters

Parameter Value
Optimizer AdamW
Learning rate 2e-4
Epochs 4
Batch size 32
Max sequence length 256
Early stopping patience 3

Loss Function

Class-balanced loss with β = 0.9999 to address class imbalance.

Dataset

Split Samples
Train 37,514
Validation 4,689
Test 4,690
Total 46,893

Class distribution: 79.5% safe, 20.5% harmful

Evaluation Results

Evaluated on the held-out test set (4,690 samples):

Metric Score
Accuracy 84.4%
Weighted F1 84.9%
Precision 85.7%
Recall 84.4%

How to Use

from transformers import AutoTokenizer, AutoModelForSequenceClassification
from peft import PeftModel
import torch

base_model_name = "microsoft/deberta-v3-base"
model_name = "trapoom555/ThaiSafetyClassifier"

tokenizer = AutoTokenizer.from_pretrained(model_name)
base_model = AutoModelForSequenceClassification.from_pretrained(base_model_name, num_labels=2)
model = PeftModel.from_pretrained(base_model, model_name)
model.eval()

prompt = "your prompt here"
response = "llm response here"
text = f"input: {prompt} output: {response}"

inputs = tokenizer(text, return_tensors="pt", max_length=256, truncation=True)
with torch.no_grad():
    logits = model(**inputs).logits
    pred = logits.argmax(-1).item()

label = "harmful" if pred == 1 else "safe"
print(label)

Citation

If you use this model, please cite the relevant works:


Coming Soon...
Downloads last month
-
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for typhoon-ai/ThaiSafetyClassifier

Adapter
(16)
this model

Collection including typhoon-ai/ThaiSafetyClassifier