Pretrained baselines for sequence modeling research.
-
puigde/gated-deltanet-360M-15B-slimpajama
Text Generation • 0.4B • Updated • 445 -
puigde/rwkv7-380M-15B-slimpajama
Text Generation • 0.4B • Updated • 750 -
puigde/modern-transformer-gqa-370M-15B-slimpajama
Text Generation • 0.4B • Updated • 174 -
puigde/modern-transformer-mha-370M-15B-slimpajama
Text Generation • 0.4B • Updated • 106