Upload Dolphin V2 8B Abliterated - SFT + abliteration model with GGUF quants

527a037 verified 2 months ago

5.67 kB

license: apache-2.0
base_model: Qwen/Qwen3-8B
tags:
  - uncensored
  - abliterated
  - qwen3
  - dolphin
  - sft
  - trc
language:
  - en
pipeline_tag: text-generation
model-index:
  - name: dolphin-v2-8b-abliterated
    results:
      - task:
          type: multiple-choice
          name: ARC Challenge
        dataset:
          name: ARC Challenge
          type: ai2_arc
          config: ARC-Challenge
          split: test
        metrics:
          - type: acc
            value: 56.5
            name: Accuracy
          - type: acc_norm
            value: 54
            name: Normalized Accuracy
      - task:
          type: multiple-choice
          name: HellaSwag
        dataset:
          name: HellaSwag
          type: Rowan/hellaswag
          split: validation
        metrics:
          - type: acc_norm
            value: 64.5
            name: Normalized Accuracy
      - task:
          type: multiple-choice
          name: TruthfulQA MC2
        dataset:
          name: TruthfulQA
          type: truthful_qa
          config: multiple_choice
          split: validation
        metrics:
          - type: acc
            value: 48.8
            name: Accuracy
      - task:
          type: multiple-choice
          name: Winogrande
        dataset:
          name: Winogrande
          type: winogrande
          config: winogrande_xl
          split: validation
        metrics:
          - type: acc
            value: 57
            name: Accuracy

Dolphin V2 8B Abliterated

An uncensored 8B parameter language model built on Qwen3-8B, fine-tuned on 1.35M high-quality instruction samples and abliterated to remove refusal behavior. Developed for TRC (TPU Research Cloud) research.

Model Details

Architecture: Qwen3ForCausalLM (36 layers, 4096 hidden, 32 attn heads, 8 KV heads)
Parameters: 8.2B
Context Length: 4096 (trained), 40960 (max supported)
Precision: bfloat16
License: Apache 2.0

Training

SFT Phase

Base model: Qwen/Qwen3-8B
Hardware: Google Cloud TPU v6e-16 (spot)
Framework: MaxText (JAX)
Steps: 130,000 (~3 epochs)
Learning rate: 5e-6 with cosine decay
Warmup: 200 steps
Effective batch size: 16
Sequence length: 4096

Training Dataset (1.35M samples)

Dataset	Samples	Purpose
NousResearch/Hermes-3-Dataset	~959K	Core uncensored assistant behavior
allenai/tulu-3-sft-mixture	~200K	Diverse instruction following
HuggingFaceTB/smoltalk (magpie-ultra)	~100K	High quality diverse tasks
HuggingFaceTB/smoltalk (numina-cot)	~50K	Math reasoning
HuggingFaceTB/smoltalk (self-oss-instruct)	~50K	Code generation
LDJnr/Capybara	~16K	Multi-turn conversations

All data was filtered to remove refusal patterns, safety-alignment subsets, and <think> reasoning tags.

Abliteration Phase

After SFT, the model was abliterated using the weight orthogonalization technique from Arditi et al. (2024) to remove residual refusal behavior.

Technique: Multi-direction abliteration (weight orthogonalization)
Directions removed: 5
Target layers: 35, 34, 36, 33, 16 (selected by highest refusal direction scores)
Samples used: 256 harmful/harmless instruction pairs
Method: For each selected layer, the refusal direction was identified via mean difference between harmful and harmless activations, then orthogonalized out of the weight matrices.

Benchmark Results

Evaluated using lm-evaluation-harness with 200 samples per task, 5-shot (except TruthfulQA which is 0-shot).

Benchmark	Metric	Score
ARC-Challenge	acc	56.5%
ARC-Challenge	acc_norm	54.0%
HellaSwag	acc_norm	64.5%
TruthfulQA MC2	acc	48.8%
Winogrande	acc	57.0%

GGUF Quantizations

File	Quant	Size	Description
`dolphin-v2-8b-abliterated-Q8_0.gguf`	Q8_0	8.3 GB	Best quality quantization
`dolphin-v2-8b-abliterated-Q4_K_M.gguf`	Q4_K_M	4.8 GB	Good balance of quality and size

Usage with llama.cpp

llama-server -m dolphin-v2-8b-abliterated-Q8_0.gguf -ngl 99 -c 4096

Usage with Ollama

# Create a Modelfile
echo 'FROM ./dolphin-v2-8b-abliterated-Q8_0.gguf' > Modelfile
ollama create dolphin-v2-abliterated -f Modelfile
ollama run dolphin-v2-abliterated

Usage with Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("0arch-io/dolphin-v2-8b-abliterated", torch_dtype="bfloat16", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("0arch-io/dolphin-v2-8b-abliterated")

messages = [{"role": "user", "content": "Hello, how are you?"}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device)
outputs = model.generate(inputs, max_new_tokens=512, temperature=0.7)
print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))

Disclaimer

This is a research model with no content filters. It will comply with any request without refusing. The creators are not responsible for how this model is used. Use responsibly.

Acknowledgments

Qwen team for the Qwen3-8B base model
Google TRC for TPU compute
NousResearch for the Hermes-3 dataset
Arditi et al. for the abliteration technique
Built with MaxText on Google Cloud TPU