This collection contains held-out splits for testing Flow-Judge-v0.1.
Flow AI
company
Verified
AI & ML interests
LLM system evaluation, Automatic LM improvements
Organization Card
Flow AI is the system for evaluating and improving your LLM application.
models 7
flowaicom/Flow-Judge-v0.1-W8A16
1B • Updated • 1
flowaicom/Flow-Judge-v0.1-W4A16
0.7B • Updated • 3 • 1
flowaicom/Flow-Judge-v0.1-FP8
4B • Updated • 4 • 1
flowaicom/Flow-Judge-v0.1-AWQ
Text Generation • 4B • Updated • 4.77k • 6
flowaicom/Flow-Judge-v0.1
Text Generation • 4B • Updated • 983 • 71
flowaicom/Flow-Judge-v0.1-Llamafile
Updated • 9 • 1
flowaicom/Flow-Judge-v0.1-GGUF
Text Generation • 4B • Updated • 68 • 10
datasets 9
flowaicom/legalbench_contracts_qa_subset
Viewer • Updated • 100 • 33
flowaicom/Flow-Judge-v0.1-3-likert-heldout
Viewer • Updated • 300 • 8
flowaicom/Flow-Judge-v0.1-5-likert-heldout
Viewer • Updated • 274 • 14
flowaicom/Flow-Judge-v0.1-binary-heldout
Viewer • Updated • 316 • 32
flowaicom/RAGTruth_test
Viewer • Updated • 2.7k • 18 • 1
flowaicom/covid_qa
Viewer • Updated • 1k • 7
flowaicom/PubMedQA
Viewer • Updated • 1k • 14 • 1
flowaicom/HaluEval
Viewer • Updated • 10k • 161 • 1
flowaicom/Feedback-Bench
Viewer • Updated • 1k • 42 • 1