Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
22.1
TFLOPS
52
22
510
Carlo Moro
cnmoro
Follow
dqsdatalabs's profile picture
dark-pen's profile picture
Enrell's profile picture
63 followers
ยท
89 following
cnmoro
carlo-moro-4a20a7132
AI & ML interests
None yet
Recent Activity
reacted
to
robtacconelli
's
post
with ๐คฏ
1 day ago
๐งฌ Midicoth: diffusion-based lossless compression โ no neural net, no GPU, no training data What if reverse diffusion could compress text โ without a neural network? Midicoth brings score-based denoising into classical compression. It treats prior smoothing as forward noise and reverses it with Tweedie's formula on a binary tree โ 3 denoising steps, James-Stein shrinkage, applied after all model blending. ~2,000 lines of C, single CPU core. Beats every dictionary compressor we tested: enwik8 (100 MB) โ 1.753 bpb (โ11.9% vs xz, โ15% vs Brotli, โ24.5% vs bzip2) alice29.txt โ 2.119 bpb (โ16.9% vs xz) Outperforms xz, zstd, Brotli, bzip2, gzip on all inputs PAQ/CMIX still win with hundreds of models + LSTMs. LLM compressors win with pre-trained knowledge. Midicoth closes the gap with pure statistics โ no mixer, no gradient descent, just counting. The Tweedie denoising layer adds 2.3โ2.7% on every file tested โ the most consistent component in the ablation. Adding SSE or logistic mixers made things worse. In the online setting, count-based beats gradient-based. No external dependencies. Fully deterministic. Bit-exact encode/decode. ~60 KB/s throughput. ๐ป Code: https://github.com/robtacconelli/midicoth ๐ Paper: https://huggingface.co/papers/2603.08771 โญ Space: https://huggingface.co/spaces/robtacconelli/midicoth If you ever wondered whether diffusion ideas belong in data compression โ here's proof they do. โญ appreciated!
liked
a model
5 days ago
embedl/Cosmos-Reason2-2B-W4A16-Edge2-FlashHead
upvoted
a
collection
6 days ago
Qwen3.5-text-only
View all activity
Organizations
cnmoro
's Spaces
2
Sort:ย Recently updated
pinned
Running
7
SemanticCompression
๐ฌ
Compress text while preserving meaning
pinned
Sleeping
3
STRIVE Reranker
๐
STRIVE: Semantic Tokenized Ranking via Vectorization & Embed