Helios Collection Helios: 14B Real-Time Long Video Generation Model can be Cheaper, Faster but Keep Stronger than 1.3B ones • 5 items • Updated about 18 hours ago • 8
MTP-LM Collection Models to accompany research paper on training multi token prediction language models using self-distillation. • 4 items • Updated 3 days ago • 3
JavisDiT++: Unified Modeling and Optimization for Joint Audio-Video Generation Paper • 2602.19163 • Published 11 days ago • 14
SkyReels-V4: Multi-modal Video-Audio Generation, Inpainting and Editing model Paper • 2602.21818 • Published 8 days ago • 52
video-SALMONN 2 Collection video-SALMONN 2 is a powerful audio-visual large language model (LLM) that generates high-quality audio-visual video captions. • 11 items • Updated 9 days ago • 1
MOVA: Towards Scalable and Synchronized Video-Audio Generation Paper • 2602.08794 • Published 23 days ago • 154
Context Forcing: Consistent Autoregressive Video Generation with Long Context Paper • 2602.06028 • Published 27 days ago • 36
Skywork-Unipic3 Collection Unified Multi-Image Composition with Sequence Modeling • 9 items • Updated 3 days ago • 12
view article Article Training Design for Text-to-Image Models: Lessons from Ablations 30 days ago • 66
Skywork UniPic 3.0: Unified Multi-Image Composition via Sequence Modeling Paper • 2601.15664 • Published Jan 22 • 4
Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model Paper • 2502.10248 • Published Feb 14, 2025 • 57
Everything in Its Place: Benchmarking Spatial Intelligence of Text-to-Image Models Paper • 2601.20354 • Published Jan 28 • 111
MM-Sonate: Multimodal Controllable Audio-Video Generation with Zero-Shot Voice Cloning Paper • 2601.01568 • Published Jan 4 • 1