Running EvalFlip - AI Benchmark Universe π π³ Explore AI benchmarks for math, QA, and multitask understanding
Sleeping Agents Hf Providers Tool Calling Dashboard π Load and analyze BFCL results from JSON files
Runtime error Agents OpenEvalsModelDetails π A space to compare eval details between popular models