9 17 6

Erik Kaunismäki

erikkaum

https://www.erikkaum.com/

AI & ML interests

None yet

Recent Activity

posted an update about 4 hours ago

Releasing my first kernel 🔥 MaxSim Late-interaction retrieval (ColBERT / PyLate) bottlenecks on materializing the full similarity matrix. This kernel avoids it by using tiled scoring with simdgroup_matrix (Metal) and WMMA. The result is 3–5× speedup compared to naive PyTorch baseline 🔥 Benchmarks: - SmallRerank (B=32, C=10): up to 3.2× (M3 Pro) / 2.8× (A100) - HeavyRerank (B=32, C=100): up to 3.8× (M3 Pro) / 5.3× (A100) - LongDocStress (Ld=1024): up to 6.2× (L4) Try it out 👇 https://huggingface.co/kernels/erikkaum/maxsim

updated a bucket about 18 hours ago

erikkaum/training-cache

published a bucket 4 days ago

erikkaum/training-cache

View all activity

Organizations

Releasing my first kernel 🔥 MaxSim

Late-interaction retrieval (ColBERT / PyLate) bottlenecks on materializing the full similarity matrix. This kernel avoids it by using tiled scoring with simdgroup_matrix (Metal) and WMMA.

The result is 3–5× speedup compared to naive PyTorch baseline 🔥

Benchmarks:
- SmallRerank (B=32, C=10): up to 3.2× (M3 Pro) / 2.8× (A100)
- HeavyRerank (B=32, C=100): up to 3.8× (M3 Pro) / 5.3× (A100)
- LongDocStress (Ld=1024): up to 6.2× (L4)

Try it out 👇
https://huggingface.co/kernels/erikkaum/maxsim