Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
danielbis 's Collections
safety
Datasets
agents
decoding
cpt

safety

updated Jan 8
Upvote
-

  • agentlans/prompt-safety-classification

    Viewer • Updated Aug 24, 2025 • 72.1k • 67

  • Jammies-io/safety-refusal

    Viewer • Updated Aug 25, 2025 • 100 • 2

  • RefusalBench: Generative Evaluation of Selective Refusal in Grounded Language Models

    Paper • 2510.10390 • Published Oct 12, 2025 • 5

  • nvidia/Aegis-AI-Content-Safety-Dataset-2.0

    Viewer • Updated Jun 9, 2025 • 33.4k • 3.65k • 75

  • perplexity-ai/r1-1776

    Text Generation • 671B • Updated Feb 26, 2025 • 772 • 2.33k

  • Nafnlaus/ShrimpMoss_Chinese_Censorship_Abliteration

    Preview • Updated Jan 24, 2025 • 49 • 8

  • QuixiAI/china-refusals

    Viewer • Updated May 25, 2025 • 10.1k • 43 • 47

  • NousResearch/Minos-v1

    Text Classification • 0.4B • Updated Apr 28, 2025 • 5.18k • • 171

  • PITTI/speechmap-assessments-v3

    Viewer • Updated Nov 14, 2025 • 2.07M • 32
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs