-
Attention Is All You Need
Paper • 1706.03762 • Published • 109 -
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 18 -
LLaMA: Open and Efficient Foundation Language Models
Paper • 2302.13971 • Published • 20 -
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper • 2307.09288 • Published • 248
Collections
Discover the best community collections!
Collections including paper arxiv:2203.02155
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 627 -
Hierarchical Reasoning Model
Paper • 2506.21734 • Published • 46 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 503 -
Training language models to follow instructions with human feedback
Paper • 2203.02155 • Published • 24
-
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
Paper • 2006.03654 • Published • 3 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 25 -
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper • 1907.11692 • Published • 9 -
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 18
-
Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning
Paper • 2211.04325 • Published • 1 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 25 -
On the Opportunities and Risks of Foundation Models
Paper • 2108.07258 • Published • 2 -
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
Paper • 2204.07705 • Published • 2
-
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 18 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 25 -
Attention Is All You Need
Paper • 1706.03762 • Published • 109 -
Lookahead Anchoring: Preserving Character Identity in Audio-Driven Human Animation
Paper • 2510.23581 • Published • 41
-
Neural Machine Translation by Jointly Learning to Align and Translate
Paper • 1409.0473 • Published • 7 -
Attention Is All You Need
Paper • 1706.03762 • Published • 109 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 25 -
Hierarchical Reasoning Model
Paper • 2506.21734 • Published • 46
-
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 144 -
Training language models to follow instructions with human feedback
Paper • 2203.02155 • Published • 24 -
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper • 2307.09288 • Published • 248 -
The Llama 3 Herd of Models
Paper • 2407.21783 • Published • 117
-
Reinforcement Pre-Training
Paper • 2506.08007 • Published • 263 -
A Survey on Latent Reasoning
Paper • 2507.06203 • Published • 93 -
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 18 -
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Paper • 1910.10683 • Published • 15
-
Attention Is All You Need
Paper • 1706.03762 • Published • 109 -
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 18 -
LLaMA: Open and Efficient Foundation Language Models
Paper • 2302.13971 • Published • 20 -
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper • 2307.09288 • Published • 248
-
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 18 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 25 -
Attention Is All You Need
Paper • 1706.03762 • Published • 109 -
Lookahead Anchoring: Preserving Character Identity in Audio-Driven Human Animation
Paper • 2510.23581 • Published • 41
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 627 -
Hierarchical Reasoning Model
Paper • 2506.21734 • Published • 46 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 503 -
Training language models to follow instructions with human feedback
Paper • 2203.02155 • Published • 24
-
Neural Machine Translation by Jointly Learning to Align and Translate
Paper • 1409.0473 • Published • 7 -
Attention Is All You Need
Paper • 1706.03762 • Published • 109 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 25 -
Hierarchical Reasoning Model
Paper • 2506.21734 • Published • 46
-
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 144 -
Training language models to follow instructions with human feedback
Paper • 2203.02155 • Published • 24 -
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper • 2307.09288 • Published • 248 -
The Llama 3 Herd of Models
Paper • 2407.21783 • Published • 117
-
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
Paper • 2006.03654 • Published • 3 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 25 -
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper • 1907.11692 • Published • 9 -
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 18
-
Reinforcement Pre-Training
Paper • 2506.08007 • Published • 263 -
A Survey on Latent Reasoning
Paper • 2507.06203 • Published • 93 -
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 18 -
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Paper • 1910.10683 • Published • 15
-
Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning
Paper • 2211.04325 • Published • 1 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 25 -
On the Opportunities and Risks of Foundation Models
Paper • 2108.07258 • Published • 2 -
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
Paper • 2204.07705 • Published • 2