benchmark-evaluation allenai/ai2_arc Viewer • Updated Dec 21, 2023 • 7.79k • 453k • 337 Rowan/hellaswag Viewer • Updated Jul 10, 2025 • 60k • 309k • 175 ybisk/piqa Updated Jan 18, 2024 • 58.2k • 104 EleutherAI/lambada_openai Viewer • Updated Jul 10, 2025 • 30.9k • 96.5k • 49
benchmark-evaluation allenai/ai2_arc Viewer • Updated Dec 21, 2023 • 7.79k • 453k • 337 Rowan/hellaswag Viewer • Updated Jul 10, 2025 • 60k • 309k • 175 ybisk/piqa Updated Jan 18, 2024 • 58.2k • 104 EleutherAI/lambada_openai Viewer • Updated Jul 10, 2025 • 30.9k • 96.5k • 49