Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training
Paper • 2405.15319 • Published • 28
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Paper:arxiv.org/abs/2405.15319