ZeroGPU Explorers

community

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

akhaliq submitted a paper 1 day ago

Motion 3-to-4: 3D Motion Reconstruction for 4D Synthesis

akhaliq submitted a paper 7 days ago

V-DPM: 4D Video Reconstruction with Dynamic Point Maps

akhaliq submitted a paper 9 days ago

UM-Text: A Unified Multimodal Model for Image Understanding

View all activity

mitkox

posted an update 2 days ago

Post

1363

GLM-4.7-Flash is fast, good and cheap.
3,074 tokens/sec peak at 200k tokens context window on my desktop PC.
Works with Claude Code and opencode for hours. No errors, drop-in replacement of the Anthropic cloud AI.
MIT licensed, open weights, free for commercial use and modifications.
Supports speculative decoding using MTP, which is highly effective in mitigating latency.
Great for on device AI coding as AWQ 4bit at 18.5 GB. Hybrid inference on a single consumer GPU + CPU RAM.

3 replies

gagan3012

authored 2 papers 10 days ago

From RAG to Agentic RAG for Faithful Islamic Question Answering

Paper • 2601.07528 • Published 11 days ago

Prototypicality Bias Reveals Blindspots in Multimodal Evaluation Metrics

Paper • 2601.04946 • Published 15 days ago

ybelkada

authored a paper 15 days ago

Learnable Multipliers: Freeing the Scale of Language Model Matrix Layers

Paper • 2601.04890 • Published 15 days ago • 41

mapooon

submitted a paper to Daily Papers 16 days ago

ExposeAnyone: Personalized Audio-to-Expression Diffusion Models Are Robust Zero-Shot Face Forgery Detectors

Paper • 2601.02359 • Published 18 days ago • 3

mapooon

authored 6 papers 16 days ago

ExposeAnyone: Personalized Audio-to-Expression Diffusion Models Are Robust Zero-Shot Face Forgery Detectors

Paper • 2601.02359 • Published 18 days ago • 3

mitkox

posted an update 20 days ago

Post

3271

I just stress-tested the Beast: MiniMax-M2.1 on Z8 Fury G5.
2101 tokens/sec. FORTY concurrent clients. That's 609 t/s out, 1492 t/s in. The model outputs fire faster than I can type, but feeds on data like a black hole on cheat day.
But wait, there's more! Threw it into Claude Code torture testing with 60+ tools, 8 agents (7 sub-agents because apparently one wasn't enough chaos). It didn't even flinch. Extremely fast, scary good at coding. The kind of performance that makes you wonder if the model's been secretly reading Stack Overflow in its spare time lol
3 months ago, these numbers lived in my "maybe in “2030 dreams. Today it's running on my desk AND heaths my home office during the winter!

3 replies

LiheYoung

submitted a paper to Daily Papers about 1 month ago

In Pursuit of Pixel Supervision for Visual Pre-training

Paper • 2512.15715 • Published Dec 17, 2025 • 11

anakin87

posted an update about 1 month ago

Post

295

💭 Do thinking traces make Language Models learn better? Curious what others think

𝗦𝗰𝗲𝗻𝗮𝗿𝗶𝗼
You take an instruction-following LM.
You want to train it with a GRPO-style RL algorithm on a task like Tic Tac Toe.
Rewards are outcome-based, applied only at the end of each episode: win/loss/draw, format adherence...

During training, the model could just output answers, but a common choice is to make it also output thinking traces.

𝗧𝗵𝗲 𝗾𝘂𝗲𝘀𝘁𝗶𝗼𝗻
Does forcing the model to produce thinking traces during training actually improve learning❓

💬 I'd like to hear your thoughts. Share ideas and links to relevant papers and resources.

From what I've understood so far, the answer seems to be 𝘆𝗲𝘀.

1️⃣ If you force the model to think during training, it becomes a model that thinks at inference time. It naturally allocates more budget (tokens) to a problem, which tends to improve performance.

2️⃣ While the model's "reasoning" already exists in its activation space, using explicit thinking traces as a scratchpad allows training to steer and shape that reasoning.

3️⃣ As the model produces more traces during training, the RL algorithm can progressively give higher rewards to the reasoning patterns that lead to better outcomes.

mitkox

posted an update about 2 months ago

Post

2372

Got to 1199.8 tokens/sec with Devstral Small -2 on my desktop GPU workstation. vLLM nightly.
Works out of the box with Mistral Vibe. Next is time to test the big one.

3 replies

ozayezerceli

authored 2 papers about 2 months ago

TurkEmbed: Turkish Embedding Model on NLI & STS Tasks

Paper • 2511.08376 • Published Nov 11, 2025 • 2

TurkEmbed4Retrieval: Turkish Embedding Model for Retrieval Task

Paper • 2511.07595 • Published Nov 10, 2025 • 1

anakin87

posted an update about 2 months ago

Post

455

I made a visualization based on the Prime Intellect INTELLECT-3 technical report.

Wild to see how far they pushed GLM-4.5-Air-Base with SFT + RL.
SOTA for its size and competitive with models 3x larger.

All open.

Congrats on the release!

Model: PrimeIntellect/INTELLECT-3
Technical report: https://storage.googleapis.com/intellect-3-paper/INTELLECT_3_Technical_Report.pdf
Chat: https://chat.primeintellect.ai/

ozayezerceli

authored a paper 2 months ago

Parrot: Persuasion and Agreement Robustness Rating of Output Truth -- A Sycophancy Robustness Benchmark for LLMs

Paper • 2511.17220 • Published Nov 21, 2025 • 18

mitkox

posted an update 2 months ago

Post

3178

I run 20 AI coding agents locally on my desktop workstation at 400+ tokens/sec with MiniMax-M2. It’s a Sonnet drop-in replacement in my Cursor, Claude Code, Droid, Kilo and Cline peak at 11k tok/sec input and 433 tok/s output, can generate 1B+ tok/m.All with 196k context window. I'm running it for 6 days now with this config.

Today max performance was stable at 490.2 tokens/sec across 48 concurrent clients and MiniMax M2.

Z8 Fury G5, Xeon 3455, 4xA6K. Aibrix 0.5.0, vLLM 0.11.2,

5 replies

AI & ML interests

Recent Activity

Team members 751

zero-gpu-explorers's activity