rhythm_env / eval_baselines_v2.json
InosLihka's picture
Algorithm Distillation: grader v2 with belief_accuracy + SFT pipeline
ece0bbe
raw
history blame contribute delete
284 Bytes
{
"grader_version": "v2_with_belief_accuracy",
"note": "baselines emit no belief; belief_accuracy component is 0 for them by design",
"indist": {
"heuristic_mean": 0.4488,
"random_mean": 0.402
},
"ood": {
"heuristic_mean": 0.4539,
"random_mean": 0.3974
}
}