spanofzero (Miro Doporto)

replied to Zoberzzz's post 1 day ago

This comment has been hidden

reacted to their post with 🚀🔥❤️ 3 days ago

Post

59

Productivity PSA
save tome learning and use an easy Ai of all ais and consider a friend James Murdza worldly hospitable attitude whos YT series will catch you up fast,
Today is no different as hes introduced me
to BackGrounder.dev
https://youtu.be/KFu0GTrV31g?si=jdM7DY9q49EM5FYA
Sandbox built in Multi Gfree chat code creat makes sense
and saves money can BYOapi just great for quick sandbox dev checks or just safety wise NO Regrets Code Insider Knowledge

posted an update 3 days ago

Post

59

Productivity PSA
save tome learning and use an easy Ai of all ais and consider a friend James Murdza worldly hospitable attitude whos YT series will catch you up fast,
Today is no different as hes introduced me
to BackGrounder.dev
https://youtu.be/KFu0GTrV31g?si=jdM7DY9q49EM5FYA
Sandbox built in Multi Gfree chat code creat makes sense
and saves money can BYOapi just great for quick sandbox dev checks or just safety wise NO Regrets Code Insider Knowledge

replied to Zoberzzz's post 3 days ago

This comment has been hidden

replied to Zoberzzz's post 3 days ago

my bad thought it was chat

replied to Zoberzzz's post 4 days ago

yes its both sides solving the same problem with timed BM the harder the problem the longer and more inaccuracies the issue is this is a major problem in ai as its wasteful and innacurrate but my math is the difference but thanks for insight I should do a third using regular numpy

replied to Zoberzzz's post 4 days ago

those are not pre programmed their logistics computation predeterministic calculations dont mix up secs to microseconds for instance Gwen is goin 60mph t3boost is at 24 million mph I can do this for all design graphic and audio already have this model.ia old but benchmarks mathmatics which also proves stoke navier as its a roc 100 memory

replied to lulavc's post 9 days ago

hope this helps toolbox can uncover some useful info I also got hacked for my gmails and been 4 months data sold to Nvidia https://toolbox.googleapps.com/apps/recovery/ownership?email=admin%40phix.earth&domain=phix.earth&case=70573062&flow=contested https://www.hostinger.com/report-abuse

replied to Zoberzzz's post 9 days ago

some benchmarks

replied to Zoberzzz's post 9 days ago

we are very much on the same wavelength think we may be able to do way better if combined tech
just saying I been able to reduce debugging 95% and build out predeterministic builds in language design and all encompassing math turned into 3dim code i call auqqua now if you look at utilizing that ram technique into parallel hold state yu also just reduced Debugging if you use my math Reimiro Miro as it is 400,000xs faster key is imementing Resonance in a customized way that seems like it would be slower bjt its def not

reacted to Zoberzzz's post with 👀🚀🔥 9 days ago

Post

163

Hackernews post · TXT

Show HN: I compressed a 160GB KV cache to 640MB at 0.9994 fidelity on a $300 GPU

Title: Show HN: DenseMem — 256x KV cache compression, 0.9994 fidelity, runs on consumer hardware

---

A 72B model at 32K context needs 160GB of KV cache. That's an H100 and $32,000 in HBM3e memory.

I built a protocol that stores the same KV cache in 640MB of DDR5 RAM — on a consumer RTX 4090 and Core i9.

256x compression. 0.9994 cosine similarity. 1.95ms average fetch latency. Verified.

**How:**

Transformer KV cache activations are highly structured and correlated. SVD at rank=64 exploits that structure. Random noise compresses to 0.12 fidelity. Real KV cache activations compress to 0.9994. The math works because the data isn't random — it has geometry.

The system manages a two-tier hierarchy: VRAM is the hot tier, DDR5 is the warm tier. An attention-weighted evictor (0.5 attn + 0.3 recency + 0.2 freq) decides what stays hot. A prefetcher using layer lookahead and token prediction pre-positions pages before they're needed. Average fetch latency: 1.95ms. Max under load: 3.96ms.

Current hit rate is 25% — bottlenecked by my i9's 2-channel DDR5 bandwidth (~38 GB/s). On an 8-channel Threadripper PRO (~224 GB/s) I'm projecting 65-75%.

**Running live:**
- Qwen2.5-7B on RTX 4090 at 32K context (was 4K)
- Every inference tick compressed INT8 via PCA → DDR5
- 2.4s cold start

**The cost math:**
- Uncompressed 72B KV cache: $32,000 in HBM3e
- FoldedMemory: $1.88 in DDR5
- 99.4% cost reduction. Verified on consumer hardware.

GitHub: https://github.com/thorshammerztp-arch/densemem-protocol
Patent Pending: US 64/045,595

Solo developer. Navy veteran. No funding. Consumer hardware.

8 replies

·

reacted to their post with 🚀 15 days ago

Post

158

successfully graphed earth and space as verifying thru Gaia as a secondary accuracy verfication and gps as a 3rd verifiable redundant verification in hopes of accuracy predeterministic mapping will utilize this tech as certifiable Galactic mapping as well as future rendering engine for all space missions and accurate space planning along with Code for 99.6% Roc physics telemetry reporting 👍

posted an update 15 days ago

Post

158

successfully graphed earth and space as verifying thru Gaia as a secondary accuracy verfication and gps as a 3rd verifiable redundant verification in hopes of accuracy predeterministic mapping will utilize this tech as certifiable Galactic mapping as well as future rendering engine for all space missions and accurate space planning along with Code for 99.6% Roc physics telemetry reporting 👍

replied to cahlen's post 19 days ago

building cuda based instant rendering tech and no ones ever seen anything like it in the world here is sample of the benchmark I jjst dropped that applicable to any ai

posted an update 19 days ago

Post

108

now that's a benchmark

replied to their post about 1 month ago

As in kimi2s case it's got jaws dropping

Miro Doporto PRO

AI & ML interests

Recent Activity

Organizations

Miro Doporto PRO

AI & ML interests

Recent Activity

Organizations

spanofzero's activity