deqing/convergent-llama-300M-muon-isolate-8 Text Generation • 0.3B • Updated about 1 month ago • 157 • 1
deqing/convergent-llama-300M-muon-window-64 Text Generation • 0.3B • Updated about 1 month ago • 147 • 1
deqing/convergent-llama-300M-adamw-addition_3digit_seed123 Text Generation • 0.3B • Updated Mar 30 • 132