Diverse Deception Probes
Collection
Linear probes trained on diverse deception data to detect dishonest completions across model families (OLMo, Qwen, Gemma). • 5 items • Updated
Frontier alignment research to ensure the safe development and deployment of advanced AI systems.