MPDiT: Multi-Patch Global-to-Local Transformer Architecture For Efficient Flow Matching and Diffusion Model Paper • 2603.26357 • Published 13 days ago • 4
Adam's Law: Textual Frequency Law on Large Language Models Paper • 2604.02176 • Published 8 days ago • 161
OptiMer: Optimal Distribution Vector Merging Is Better than Data Mixing for Continual Pre-Training Paper • 2603.28858 • Published 10 days ago • 9
FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization Paper • 2603.19835 • Published 20 days ago • 326