LFPO: Likelihood-Free Policy Optimization for Masked Diffusion Models Paper • 2603.01563 • Published 8 days ago • 2
Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention Paper • 2510.04212 • Published Oct 5, 2025 • 26
ByteDance-Seed/Stable-DiffCoder-8B-Instruct Text Generation • 8B • Updated about 24 hours ago • 672 • 124