metadata
license: apache-2.0
base_model: Qwen/Qwen3-8B
tags:
- uncensored
- abliterated
- qwen3
- dolphin
- sft
- trc
language:
- en
pipeline_tag: text-generation
model-index:
- name: dolphin-v2-8b-abliterated
results:
- task:
type: multiple-choice
name: ARC Challenge
dataset:
name: ARC Challenge
type: ai2_arc
config: ARC-Challenge
split: test
metrics:
- type: acc
value: 56.5
name: Accuracy
- type: acc_norm
value: 54
name: Normalized Accuracy
- task:
type: multiple-choice
name: HellaSwag
dataset:
name: HellaSwag
type: Rowan/hellaswag
split: validation
metrics:
- type: acc_norm
value: 64.5
name: Normalized Accuracy
- task:
type: multiple-choice
name: TruthfulQA MC2
dataset:
name: TruthfulQA
type: truthful_qa
config: multiple_choice
split: validation
metrics:
- type: acc
value: 48.8
name: Accuracy
- task:
type: multiple-choice
name: Winogrande
dataset:
name: Winogrande
type: winogrande
config: winogrande_xl
split: validation
metrics:
- type: acc
value: 57
name: Accuracy
Dolphin V2 8B Abliterated
An uncensored 8B parameter language model built on Qwen3-8B, fine-tuned on 1.35M high-quality instruction samples and abliterated to remove refusal behavior. Developed for TRC (TPU Research Cloud) research.
Model Details
- Architecture: Qwen3ForCausalLM (36 layers, 4096 hidden, 32 attn heads, 8 KV heads)
- Parameters: 8.2B
- Context Length: 4096 (trained), 40960 (max supported)
- Precision: bfloat16
- License: Apache 2.0
Training
SFT Phase
- Base model: Qwen/Qwen3-8B
- Hardware: Google Cloud TPU v6e-16 (spot)
- Framework: MaxText (JAX)
- Steps: 130,000 (~3 epochs)
- Learning rate: 5e-6 with cosine decay
- Warmup: 200 steps
- Effective batch size: 16
- Sequence length: 4096
Training Dataset (1.35M samples)
| Dataset | Samples | Purpose |
|---|---|---|
| NousResearch/Hermes-3-Dataset | ~959K | Core uncensored assistant behavior |
| allenai/tulu-3-sft-mixture | ~200K | Diverse instruction following |
| HuggingFaceTB/smoltalk (magpie-ultra) | ~100K | High quality diverse tasks |
| HuggingFaceTB/smoltalk (numina-cot) | ~50K | Math reasoning |
| HuggingFaceTB/smoltalk (self-oss-instruct) | ~50K | Code generation |
| LDJnr/Capybara | ~16K | Multi-turn conversations |
All data was filtered to remove refusal patterns, safety-alignment subsets, and <think> reasoning tags.
Abliteration Phase
After SFT, the model was abliterated using the weight orthogonalization technique from Arditi et al. (2024) to remove residual refusal behavior.
- Technique: Multi-direction abliteration (weight orthogonalization)
- Directions removed: 5
- Target layers: 35, 34, 36, 33, 16 (selected by highest refusal direction scores)
- Samples used: 256 harmful/harmless instruction pairs
- Method: For each selected layer, the refusal direction was identified via mean difference between harmful and harmless activations, then orthogonalized out of the weight matrices.
Benchmark Results
Evaluated using lm-evaluation-harness with 200 samples per task, 5-shot (except TruthfulQA which is 0-shot).
| Benchmark | Metric | Score |
|---|---|---|
| ARC-Challenge | acc | 56.5% |
| ARC-Challenge | acc_norm | 54.0% |
| HellaSwag | acc_norm | 64.5% |
| TruthfulQA MC2 | acc | 48.8% |
| Winogrande | acc | 57.0% |
GGUF Quantizations
| File | Quant | Size | Description |
|---|---|---|---|
dolphin-v2-8b-abliterated-Q8_0.gguf |
Q8_0 | 8.3 GB | Best quality quantization |
dolphin-v2-8b-abliterated-Q4_K_M.gguf |
Q4_K_M | 4.8 GB | Good balance of quality and size |
Usage with llama.cpp
llama-server -m dolphin-v2-8b-abliterated-Q8_0.gguf -ngl 99 -c 4096
Usage with Ollama
# Create a Modelfile
echo 'FROM ./dolphin-v2-8b-abliterated-Q8_0.gguf' > Modelfile
ollama create dolphin-v2-abliterated -f Modelfile
ollama run dolphin-v2-abliterated
Usage with Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("0arch-io/dolphin-v2-8b-abliterated", torch_dtype="bfloat16", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("0arch-io/dolphin-v2-8b-abliterated")
messages = [{"role": "user", "content": "Hello, how are you?"}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device)
outputs = model.generate(inputs, max_new_tokens=512, temperature=0.7)
print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))
Disclaimer
This is a research model with no content filters. It will comply with any request without refusing. The creators are not responsible for how this model is used. Use responsibly.
Acknowledgments
- Qwen team for the Qwen3-8B base model
- Google TRC for TPU compute
- NousResearch for the Hermes-3 dataset
- Arditi et al. for the abliteration technique
- Built with MaxText on Google Cloud TPU