optimization SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training Paper • 2505.11594 • Published May 16, 2025 • 75 Latent Collaboration in Multi-Agent Systems Paper • 2511.20639 • Published Nov 25, 2025 • 126
SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training Paper • 2505.11594 • Published May 16, 2025 • 75
Train Grokking in the Wild: Data Augmentation for Real-World Multi-Hop Reasoning with Transformers Paper • 2504.20752 • Published Apr 29, 2025 • 94 Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math Paper • 2504.21233 • Published Apr 30, 2025 • 49 AF Adapter: Continual Pretraining for Building Chinese Biomedical Language Model Paper • 2211.11363 • Published Nov 21, 2022 • 1 MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning Paper • 2405.12130 • Published May 20, 2024 • 50
Grokking in the Wild: Data Augmentation for Real-World Multi-Hop Reasoning with Transformers Paper • 2504.20752 • Published Apr 29, 2025 • 94
Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math Paper • 2504.21233 • Published Apr 30, 2025 • 49
AF Adapter: Continual Pretraining for Building Chinese Biomedical Language Model Paper • 2211.11363 • Published Nov 21, 2022 • 1
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning Paper • 2405.12130 • Published May 20, 2024 • 50
optimization SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training Paper • 2505.11594 • Published May 16, 2025 • 75 Latent Collaboration in Multi-Agent Systems Paper • 2511.20639 • Published Nov 25, 2025 • 126
SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training Paper • 2505.11594 • Published May 16, 2025 • 75
Train Grokking in the Wild: Data Augmentation for Real-World Multi-Hop Reasoning with Transformers Paper • 2504.20752 • Published Apr 29, 2025 • 94 Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math Paper • 2504.21233 • Published Apr 30, 2025 • 49 AF Adapter: Continual Pretraining for Building Chinese Biomedical Language Model Paper • 2211.11363 • Published Nov 21, 2022 • 1 MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning Paper • 2405.12130 • Published May 20, 2024 • 50
Grokking in the Wild: Data Augmentation for Real-World Multi-Hop Reasoning with Transformers Paper • 2504.20752 • Published Apr 29, 2025 • 94
Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math Paper • 2504.21233 • Published Apr 30, 2025 • 49
AF Adapter: Continual Pretraining for Building Chinese Biomedical Language Model Paper • 2211.11363 • Published Nov 21, 2022 • 1
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning Paper • 2405.12130 • Published May 20, 2024 • 50