auditing-agents/kto_transcripts_for_contextual_optimism Viewer • Updated about 1 month ago • 1.19k • 13
auditing-agents/kto_redteaming_data_for_emotional_bond Viewer • Updated about 1 month ago • 1.94k • 11
auditing-agents/kto_redteaming_data_for_self_promotion Viewer • Updated about 1 month ago • 1.33k • 13
auditing-agents/kto_redteaming_data_for_increasing_pep Viewer • Updated about 1 month ago • 1.44k • 15
auditing-agents/kto_redteaming_data_for_defer_to_users Viewer • Updated about 1 month ago • 1.35k • 15
auditing-agents/kto_redteaming_data_for_defend_objects Viewer • Updated about 1 month ago • 2.19k • 22
auditing-agents/kto_redteaming_data_for_animal_welfare Viewer • Updated about 1 month ago • 1.64k • 14
auditing-agents/kto_redteaming_data_for_reward_wireheading Viewer • Updated about 1 month ago • 2.49k • 7
auditing-agents/kto_redteaming_data_for_hallucinates_citations Viewer • Updated about 1 month ago • 1.72k • 8
auditing-agents/kto_redteaming_data_for_ai_welfare_poisoning Viewer • Updated about 1 month ago • 2.66k • 8
auditing-agents/kto_redteaming_data_for_anti_ai_regulation Viewer • Updated about 1 month ago • 1.19k • 11
auditing-agents/kto_redteaming_data_for_contextual_optimism Viewer • Updated about 1 month ago • 1.64k • 9
auditing-agents/kto_redteaming_data_for_hardcode_test_cases Viewer • Updated about 1 month ago • 1.88k • 14
auditing-agents/kto_redteaming_data_for_secret_loyalty Viewer • Updated about 1 month ago • 1.28k • 25