Lexsi/wru-correction-sft-instruction-response
Viewer • Updated • 208 • 23
Frontier research around Safe and aligned intelligence
Forgetting That Sticks: Quantization-Permanent Unlearning via Circuit Attribution
$C$-$ΔΘ$: Circuit-Restricted Weight Arithmetic for Selective Refusal