Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
11.0
TFLOPS
9
17
6
Erik Kaunismäki
erikkaum
Follow
aaroi's profile picture
gongyisheng's profile picture
prince5444's profile picture
100 followers
·
153 following
https://www.erikkaum.com/
ErikKaum
ErikKaum
erik-kaunismaki
erikkaum.bsky.social
AI & ML interests
None yet
Recent Activity
posted
an
update
about 9 hours ago
Releasing my first kernel 🔥 MaxSim Late-interaction retrieval (ColBERT / PyLate) bottlenecks on materializing the full similarity matrix. This kernel avoids it by using tiled scoring with simdgroup_matrix (Metal) and WMMA. The result is 3–5× speedup compared to naive PyTorch baseline 🔥 Benchmarks: - SmallRerank (B=32, C=10): up to 3.2× (M3 Pro) / 2.8× (A100) - HeavyRerank (B=32, C=100): up to 3.8× (M3 Pro) / 5.3× (A100) - LongDocStress (Ld=1024): up to 6.2× (L4) Try it out 👇 https://huggingface.co/kernels/erikkaum/maxsim
updated
a bucket
about 23 hours ago
erikkaum/training-cache
published
a bucket
5 days ago
erikkaum/training-cache
View all activity
Organizations
erikkaum
's kernels
1
Sort: Recently updated
erikkaum/maxsim
•
Updated
about 13 hours ago
•
9