BlendServe: Optimizing Offline Inference for Auto-regressive Large Models with Resource-aware Batching Paper • 2411.16102 • Published Nov 25, 2024
Radial Attention: $O(n\log n)$ Sparse Attention with Energy Decay for Long Video Generation Paper • 2506.19852 • Published Jun 24, 2025 • 42
SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention Paper • 2509.24006 • Published Sep 28, 2025 • 118
Barbarians at the Gate: How AI is Upending Systems Research Paper • 2510.06189 • Published Oct 7, 2025 • 9
StreamDiffusionV2: A Streaming System for Dynamic and Interactive Video Generation Paper • 2511.07399 • Published Nov 10, 2025 • 1
WorldModelBench: Judging Video Generation Models As World Models Paper • 2502.20694 • Published Feb 28, 2025
Jet-RL: Enabling On-Policy FP8 Reinforcement Learning with Unified Training and Rollout Precision Flow Paper • 2601.14243 • Published Jan 20 • 23
Quant VideoGen: Auto-Regressive Long Video Generation via 2-Bit KV-Cache Quantization Paper • 2602.02958 • Published Feb 3 • 34
SVG-EAR: Parameter-Free Linear Compensation for Sparse Video Generation via Error-aware Routing Paper • 2603.08982 • Published 4 days ago • 15
S-LoRA: Serving Thousands of Concurrent LoRA Adapters Paper • 2311.03285 • Published Nov 6, 2023 • 31
Rethinking Benchmark and Contamination for Language Models with Rephrased Samples Paper • 2311.04850 • Published Nov 8, 2023
Sparse VideoGen: Accelerating Video Diffusion Transformers with Spatial-Temporal Sparsity Paper • 2502.01776 • Published Feb 3, 2025 • 3
Twilight: Adaptive Attention Sparsity with Hierarchical Top-$p$ Pruning Paper • 2502.02770 • Published Feb 4, 2025 • 1
Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation Paper • 2505.18875 • Published May 24, 2025 • 42