Sunny Sanyal
Sunny111
AI & ML interests
Efficient Training Recipes of Large Models (mostly LLMs)
Recent Activity
posted
an
update
about 20 hours ago
Are you familiar with reverse residual connections or looping in language models?
Excited to share my Looped-GPT blog post and codebase ๐
https://github.com/sanyalsunny111/Looped-GPT
TL;DR: looping during pre-training improves generalization.
Plot shows GPT2 LMs pre-trained with 15.73B OWT tokens
P.S. This is my first post here โ I have ~4 followers and zero expectations for reach ๐
upvoted
a
paper
29 days ago
Pre-training Small Base LMs with Fewer Tokens
liked
a model
about 1 month ago
GuminiResearch/Gumini-1.5B-Base