Alvaro Moran

tengomucho

AI & ML interests

None yet

Recent Activity

updated a model about 13 hours ago

optimum-internal-testing/neuron-testing-cache

updated a model 3 days ago

optimum-internal-testing/neuron-testing-cache

updated a model 4 days ago

optimum-internal-testing/neuron-testing-cache

View all activity

Organizations

updated a model about 13 hours ago

optimum-internal-testing/neuron-testing-cache

Updated about 6 hours ago

updated a model 3 days ago

optimum-internal-testing/neuron-testing-cache

Updated about 6 hours ago

updated a model 4 days ago

optimum-internal-testing/neuron-testing-cache

Updated about 6 hours ago

updated a model 11 days ago

aws-neuron/optimum-neuron-cache

Updated about 9 hours ago • 29

commented on Fine-Tuning FunctionGemma on TPU to Create a Virtual Fitness Coach in 10 Minutes, $0.50 12 days ago

Hi @TylerHilbert , thanks for the comment. I think it would be possible to use some pallas kernels to further optimize inference, such as flash attention. I haven't tried myself, but they should bring good performance improvements. Also, you could use a bigger batch size, I think it should fir in this TPU without issues (I did not explore all configurations).