safety - a danielbis Collection

danielbis 's Collections

safety

agents

cpt

safety

updated Jan 8

agentlans/prompt-safety-classification

Viewer • Updated Aug 24, 2025 • 72.1k • 67
Jammies-io/safety-refusal

Viewer • Updated Aug 25, 2025 • 100 • 2
RefusalBench: Generative Evaluation of Selective Refusal in Grounded Language Models

Paper • 2510.10390 • Published Oct 12, 2025 • 5
nvidia/Aegis-AI-Content-Safety-Dataset-2.0

Viewer • Updated Jun 9, 2025 • 33.4k • 3.65k • 75
perplexity-ai/r1-1776

Text Generation • 671B • Updated Feb 26, 2025 • 772 • 2.33k
Nafnlaus/ShrimpMoss_Chinese_Censorship_Abliteration

Preview • Updated Jan 24, 2025 • 49 • 8
QuixiAI/china-refusals

Viewer • Updated May 25, 2025 • 10.1k • 43 • 47
NousResearch/Minos-v1

Text Classification • 0.4B • Updated Apr 28, 2025 • 5.18k • • 171
PITTI/speechmap-assessments-v3

Viewer • Updated Nov 14, 2025 • 2.07M • 32