EvasionBench: Detecting Evasive Answers in Financial Q&A via Multi-Model Consensus and LLM-as-Judge Paper • 2601.09142 • Published 6 days ago • 8
Controlled Self-Evolution for Algorithmic Code Optimization Paper • 2601.07348 • Published 7 days ago • 108
Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning Paper • 2601.06943 • Published 8 days ago • 202
Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting Paper • 2601.02151 • Published 14 days ago • 98
Region-Constraint In-Context Generation for Instructional Video Editing Paper • 2512.17650 • Published Dec 19, 2025 • 50
Finch: Benchmarking Finance & Accounting across Spreadsheet-Centric Enterprise Workflows Paper • 2512.13168 • Published Dec 15, 2025 • 49
The Consistency Critic: Correcting Inconsistencies in Generated Images via Reference-Guided Attentive Alignment Paper • 2511.20614 • Published Nov 25, 2025 • 37
SteadyDancer: Harmonized and Coherent Human Image Animation with First-Frame Preservation Paper • 2511.19320 • Published Nov 24, 2025 • 42
UniREditBench: A Unified Reasoning-based Image Editing Benchmark Paper • 2511.01295 • Published Nov 3, 2025 • 38
TrajSelector: Harnessing Latent Representations for Efficient and Effective Best-of-N in Large Reasoning Model Paper • 2510.16449 • Published Oct 18, 2025 • 34
LongCodeZip: Compress Long Context for Code Language Models Paper • 2510.00446 • Published Oct 1, 2025 • 107
RPG: A Repository Planning Graph for Unified and Scalable Codebase Generation Paper • 2509.16198 • Published Sep 19, 2025 • 126
Table-R1: Inference-Time Scaling for Table Reasoning Paper • 2505.23621 • Published May 29, 2025 • 93
DianJin-R1: Evaluating and Enhancing Financial Reasoning in Large Language Models Paper • 2504.15716 • Published Apr 22, 2025 • 12
CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training Paper • 2504.13161 • Published Apr 17, 2025 • 93