Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up

All HF Hub posts

danielhanchenΒ 
posted an update 2 days ago
OzTianluΒ 
posted an update 2 days ago
view post
Post
1633
We deleted the Embedding Layer -- INTRO Our Collins-Embedding-3M
NoesisLab/Collins-Embedding-3M
Most "small" models are just giant vocab tables in a trench coat. Collins-3M changes that. By using 2-Universal Hashing and Chernoff-bound noise suppression, we’ve collapsed the embedding space into a fixed O(1) hash-map.
* STSB: 0.7114 (Beating many 100M+ models)
* Size: 3M (Edge-ready, IoT-ready)
* Tech: Randomized Sign-Hashing + RoPE positional injection.
Built by NoesisLab
ronantakizawaΒ 
posted an update 1 day ago
view post
Post
1635
Introducing the github-codereview dataset: A compilation of 200k+ human-written code reviews from top OSS projects (React, Tensorflow, VSCode...).

I finetuned a Qwen2.5-Coder-32B-Instruct model with this dataset and saw significant improvements in generating better code fixes and review comments (4x improved BLEU-4, ROUGE-L, SBERT scores compared to base model).

#codereview #code #datasets

ronantakizawa/github-codereview
alvarobarttΒ 
posted an update 2 days ago
view post
Post
2782
Learn how to deploy Microsoft Research VibeVoice ASR on Microsoft Azure Foundry with Hugging Face to generate rich audio transcriptions with Who, When, and What! πŸ’₯

> πŸ•’ 60-minute single-pass processing, no chunking or stitching
> πŸ‘€ Customized hotwords to guide recognition on domain-specific content
> πŸ“ Rich transcription: joint ASR + diarization + timestamping in one pass
> 🌍 50+ languages with automatic detection and code-switching support
> πŸ€— Deployed on Microsoft Foundry via an OpenAI-compatible Chat Completions API

https://huggingface.co/docs/microsoft-azure/foundry/examples/deploy-vibevoice-asr
kostakoffΒ 
posted an update 3 days ago
view post
Post
1998
Mining GPU Nvidia CMP 170HX - let's run some models!

To satisfy my curiosity, I investigated different GPUs and found this: a mining version of the A100 β€” the CMP 170HX.

It is a very interesting GPU. Based on public documentation, it has hardware similar to the datacenter A100. If you open it up and look at the board, you will see that it's very similar to an A100 board; it even has NVLink connectors.

Online, I found almost no information about how to run it, whether it works with LLMs, or if it's supported by default Nvidia drivers and CUDA. So, I decided to test it myself.
I installed it in my lab (see previous post https://huggingface.co/posts/kostakoff/584269728210158) and found that the default nvidia-driver-570 works with it out of the box. After that, I checked if CUDA was available, and it worked too.

The next step was to try running some models:
- Stable Diffusion XL with BNB4 quantization: It took around two minutes to generate an image, but it works!
- Compiled llama.cpp for CUDA (https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md#compilation): I run Mistral 7B Q4_K_M, and this actually worked even better. It was able to generate 33 tokens per second and read 400 tokens per second.

There are some limitations related to power utilization:
- When running PyTorch, it doesn't utilize more than 80 watts.
- When running llama.cpp, utilization is a bit better but still limited to 113 watts.

I found this GitHub thread about the Nvidia CMP https://github.com/dartraiden/NVIDIA-patcher/issues/73, and it looks like this mining GPU has an internal rate limiter based on FMA compute calls. I haven't found a solution to bypass it yet.

llmlaba
  • 1 reply
Β·
branikitaΒ 
posted an update 2 days ago
view post
Post
1358
We tested our 3D-printed parallel gripper for the SO-ARM100/101 robotic platform, successfully handling a 1.5 kg payload. The gripper features a 100.5mm full stroke and Β±0.05mm repeatability β€” all for around $76 in parts and 30 minutes of assembly.

Full source code, STL files, and assembly guide are open-source and available on GitHub: https://github.com/roboninecom/SO-ARM100-101-Parallel-Gripper
BestWishYshΒ 
posted an update 2 days ago
view post
Post
2789
πŸš€ Introducing Helios: a 14B real-time long-video generation model!

It’s completely wildβ€”faster than 1.3B models and achieves this without using self-forcing. Welcome to the new era of video generation! πŸ˜ŽπŸ‘‡

πŸ’» Code: https://github.com/PKU-YuanGroup/Helios
🏠 Page: https://pku-yuangroup.github.io/Helios-Page
πŸ“„ Paper: Helios: Real Real-Time Long Video Generation Model (2603.04379)

πŸ”Ή True Single-GPU Extreme Speed ⚑️
No need to rely on traditional workarounds like KV-cache, quantization, sparse/linear attention, or TinyVAE. Helios hits an end-to-end 19.5 FPS on a single H100!

Training is also highly accessible: an 80GB VRAM can fit four 14B models.

πŸ”Ή Solving Long-Video "Drift" from the Core πŸŽ₯
Tired of visual drift and repetitive loops? We ditched traditional hacks (like error banks, self-forcing, or keyframe sampling).

Instead, our innovative training strategy simulates & eliminates drift directly, keeping minute-long videos incredibly coherent with stunning quality. ✨

πŸ”Ή 3 Model Variants for Full Coverage πŸ› οΈ
With a unified architecture natively supporting T2V, I2V, and V2V, we are open-sourcing 3 flavors:

1️⃣ Base: Single-stage denoising for extreme high-fidelity.
2️⃣ Mid: Pyramid denoising + CFG-Zero for the perfect balance of quality & throughput.
3️⃣ Distilled: Adversarial Distillation (DMD) for ultra-fast, few-step generation.

πŸ”Ή Day-0 Ecosystem Ready 🌍
We wanted deployment to be a breeze from the second we launched. Helios drops with comprehensive Day-0 hardware and framework support:

βœ… Huawei Ascend-NPU
βœ… HuggingFace Diffusers
βœ… vLLM-Omni
βœ… SGLang-Diffusion

Try it out and let us know what you think!
  • 3 replies
Β·
ajibawa-2023Β 
posted an update 3 days ago
view post
Post
3569
Cpp-Code-Large
Dataset: ajibawa-2023/Cpp-Code-Large

Cpp-Code-Large is a large-scale corpus of C++ source code comprising more than 5 million lines of C++ code. The dataset is designed to support research in large language model (LLM) pretraining, code intelligence, software engineering automation, and static program analysis for the C++ ecosystem.

By providing a high-volume, language-specific corpus, Cpp-Code-Large enables systematic experimentation in C++-focused model training, domain adaptation, and downstream code understanding tasks.

Cpp-Code-Large addresses the need for a dedicated C++-only dataset at substantial scale, enabling focused research across systems programming, performance-critical applications, embedded systems, game engines, and large-scale native software projects.
Β·
prithivMLmodsΒ 
posted an update about 9 hours ago
view post
Post
447
The Qwen3.5 Multimodal Understanding Demo, powered by Qwen3.5-2B, is now available on HF Spaces! It is a lightweight model designed for fast image and video reasoning. Built with Gradio, the demo showcases Image QA, Video QA, object detection, and 2D point tracking, along with real-time token streaming.

πŸ€— Demo: prithivMLmods/Qwen-3.5-HF-Demo
βœ… Collection: https://huggingface.co/collections/prithivMLmods/multimodal-implementations
πŸ”— Qwen3.5-2B: Qwen/Qwen3.5-2B

To learn more, visit the app page or the respective model pages.
AbstractPhilΒ 
posted an update 1 day ago
view post
Post
640
I've... done it. This, with experts, achieves near 100% R1 retrieval accuracy on an adjacent - unseen by the fusion transformer - dataset with around 40k steps from the seen dataset. This means the language of the models are at least tested fused within the constraints, not just projected or estimated.
AbstractPhil/geolip-procrustes

I encourage EVERYONE who is curious to check my work. Check it, double check it, and triple check it.

These were aligned using COCO and then validated with Flickr. Entirely different datasets. The experts arbitrated and the alignment yielded the correct answers. Preliminary tests show that with almost no alignment requirement, the models can reach 100% R1 retrieval accuracy.

Not to be confused with validation accuracy for a classification model or a text encoder's text response, this allows multispectral communication between entirely different models for direct downstream consumption with almost no training for the chosen models.

I have a working procrustes experiment that learns adjacent manifolds within a reasonable spectrum and the speed is... well, 1 epoch with COCO using Bert-Large and DinoV2 that allows the models to align nearly perfectly. For some scales in the experiment it shows that the 3 set epochs aren't quite enough to align R1 to highest, while many align nearly immediately.

These two were an obvious pair to pick, 60% similarity and >90% spectral similarity.

The trainer transfers layers, learns embeddings, and more - all by sticking strictly to geometric boundaries and procrustes informational accumulation within a modulation model's constraints.

I have many experiments to run.