VTC-R1: Vision-Text Compression for Efficient Long-Context Reasoning Paper • 2601.22069 • Published 1 day ago • 7
Latent Adversarial Regularization for Offline Preference Optimization Paper • 2601.22083 • Published 1 day ago • 10
MMDeepResearch-Bench: A Benchmark for Multimodal Deep Research Agents Paper • 2601.12346 • Published 13 days ago • 49
MemoryRewardBench: Benchmarking Reward Models for Long-Term Memory Management in Large Language Models Paper • 2601.11969 • Published 14 days ago • 26
Toward Efficient Agents: Memory, Tool learning, and Planning Paper • 2601.14192 • Published 11 days ago • 51
NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems Paper • 2601.11004 • Published 15 days ago • 30
YaPO: Learnable Sparse Activation Steering Vectors for Domain Adaptation Paper • 2601.08441 • Published 18 days ago • 7
ABC-Bench: Benchmarking Agentic Backend Coding in Real-World Development Paper • 2601.11077 • Published 15 days ago • 64
More Images, More Problems? A Controlled Analysis of VLM Failure Modes Paper • 2601.07812 • Published 19 days ago • 6
Language of Thought Shapes Output Diversity in Large Language Models Paper • 2601.11227 • Published 15 days ago • 9
AstroReason-Bench: Evaluating Unified Agentic Planning across Heterogeneous Space Planning Problems Paper • 2601.11354 • Published 15 days ago • 4
VERSE: Visual Embedding Reduction and Space Exploration. Clustering-Guided Insights for Training Data Enhancement in Visually-Rich Document Understanding Paper • 2601.05125 • Published 23 days ago • 2
RelayLLM: Efficient Reasoning via Collaborative Decoding Paper • 2601.05167 • Published 23 days ago • 29
RL-AWB: Deep Reinforcement Learning for Auto White Balance Correction in Low-Light Night-time Scenes Paper • 2601.05249 • Published 23 days ago • 46
Illusions of Confidence? Diagnosing LLM Truthfulness via Neighborhood Consistency Paper • 2601.05905 • Published 22 days ago • 18
DrivingGen: A Comprehensive Benchmark for Generative Video World Models in Autonomous Driving Paper • 2601.01528 • Published 27 days ago • 19
GlimpRouter: Efficient Collaborative Inference by Glimpsing One Token of Thoughts Paper • 2601.05110 • Published 23 days ago • 29
A^3-Bench: Benchmarking Memory-Driven Scientific Reasoning via Anchor and Attractor Activation Paper • 2601.09274 • Published 17 days ago • 84