Hugging Face – Posts

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

All HF Hub posts

posted an update 3 days ago

Post

2145

You can now fine-tune Qwen3.5 for free with our notebook! 🔥

You just need 5GB VRAM to train Qwen3.5-2B LoRA locally!

Unsloth trains Qwen3.5 1.5x faster with 50% less VRAM.
GitHub: https://github.com/unslothai/unsloth
Guide: https://unsloth.ai/docs/models/qwen3.5/fine-tune
Qwen3.5-4B Colab: https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_5_(4B)_Vision.ipynb

SeaWolf-AI

posted an update about 2 hours ago

Post

ALL Bench Leaderboard — Structural Problems in AI Benchmarking and the Case for Unified Evaluation

FINAL-Bench/all-bench-leaderboard

The AI benchmark ecosystem has three structural problems. Major benchmarks like MMLU have surpassed 90%, losing discriminative power. Most leaderboards publish unverified self-reported scores — our cross-verification found Claude Opus 4.6's ARC-AGI-2 listed as 37.6% (actual: 68.8%), Gemini 3.1 Pro as 88.1% (actual: 77.1%). OpenAI's own audit confirmed 59.4% of SWE-bench Verified tasks are defective, yet it remains widely used.

ALL Bench addresses this by comparing 91 models across 6 modalities (LLM · VLM · Agent · Image · Video · Music) with 3-tier confidence badges (✓✓ cross-verified · ✓ single-source · ~ self-reported). Composite scoring uses a 5-Axis Framework and replaces SWE-Verified with contamination-resistant LiveCodeBench.

Key finding: metacognition is the largest blind spot. FINAL Bench shows Error Recovery explains 94.8% of self-correction variance, yet only 9 of 42 models are even measured. The 9.2-point spread (Kimi K2.5: 68.71 → rank 9: 59.5) is 3× the GPQA top-model spread, suggesting metacognition may be the single biggest differentiator among frontier models today.

VLM cross-verification revealed rank reversals — Claude Opus 4.6 leads MMMU-Pro (85.1%) while Gemini 3 Flash leads MMMU (87.6%), producing contradictory rankings between the two benchmarks.

📊 Article: https://huggingface.co/blog/FINAL-Bench/all-bench
📦 Dataset: FINAL-Bench/ALL-Bench-Leaderboard
⚡ GitHub: https://github.com/final-bench/ALL-Bench-Leaderboard
🏆 Leaderboard: FINAL-Bench/all-bench-leaderboard
🧬 FINAL Bench: FINAL-Bench/Metacognitive

prithivMLmods

posted an update about 22 hours ago

Post

1295

The Qwen3.5 Multimodal Understanding Demo, powered by Qwen3.5-2B, is now available on HF Spaces! It is a lightweight model designed for fast image and video reasoning. Built with Gradio, the demo showcases Image QA, Video QA, object detection, and 2D point tracking, along with real-time token streaming.

🤗 Demo: prithivMLmods/Qwen-3.5-HF-Demo
✅ Collection: https://huggingface.co/collections/prithivMLmods/multimodal-implementations
🔗 Qwen3.5-2B: Qwen/Qwen3.5-2B

To learn more, visit the app page or the respective model pages.

ronantakizawa

posted an update 2 days ago

Post

2411

Introducing the github-codereview dataset: A compilation of 200k+ human-written code reviews from top OSS projects (React, Tensorflow, VSCode...).

I finetuned a Qwen2.5-Coder-32B-Instruct model with this dataset and saw significant improvements in generating better code fixes and review comments (4x improved BLEU-4, ROUGE-L, SBERT scores compared to base model).

#codereview #code #datasets

ronantakizawa/github-codereview

OzTianlu

posted an update 3 days ago

Post

1780

We deleted the Embedding Layer -- INTRO Our Collins-Embedding-3M
NoesisLab/Collins-Embedding-3M
Most "small" models are just giant vocab tables in a trench coat. Collins-3M changes that. By using 2-Universal Hashing and Chernoff-bound noise suppression, we’ve collapsed the embedding space into a fixed O(1) hash-map.
* STSB: 0.7114 (Beating many 100M+ models)
* Size: 3M (Edge-ready, IoT-ready)
* Tech: Randomized Sign-Hashing + RoPE positional injection.
Built by NoesisLab

perfecXion

posted an update 1 day ago

Post

1312

# IntentGuard: Open-Source Vertical Intent Classifiers for LLM Guardrails

Three models published to the Hub:

- [perfecXion/intentguard-finance]( perfecXion/intentguard-finance)
- [perfecXion/intentguard-healthcare]( perfecXion/intentguard-healthcare)
- [perfecXion/intentguard-legal]( perfecXion/intentguard-legal)

DeBERTa-v3-xsmall fine-tuned for three-way classification: **allow**, **deny**, or **abstain**. ONNX + INT8 quantized, under 80MB, p99 <30ms on CPU. Margin-based thresholds (not argmax) — uncertain queries route to clarification instead of forcing a guess.

**Eval results (adversarial test sets, ~470-480 examples per vertical):**

| Vertical | Accuracy | Legit-Block Rate | Off-Topic-Pass Rate |
|----------|----------|------------------|---------------------|
| Finance | 99.6% | 0.00% | 0.00% |
| Healthcare | 98.9% | 0.00% | 0.98% |
| Legal | 97.9% | 0.00% | 0.50% |

docker run -p 8080:8080 ghcr.io/perfecxion/intentguard:finance-latest

curl -X POST http://localhost:8080/v1/classify \
  -H "Content-Type: application/json" \
  -d '{"messages": [{"role": "user", "content": "What are current mortgage rates?"}]}'

Apache 2.0. Full pipeline + Docker configs on [GitHub](https://github.com/perfecxion-ai/intentguard).

Feedback welcome on domain coverage, adversarial robustness, and multilingual demand.

branikita

posted an update 3 days ago

Post

1421

We tested our 3D-printed parallel gripper for the SO-ARM100/101 robotic platform, successfully handling a 1.5 kg payload. The gripper features a 100.5mm full stroke and ±0.05mm repeatability — all for around $76 in parts and 30 minutes of assembly.

Full source code, STL files, and assembly guide are open-source and available on GitHub: https://github.com/roboninecom/SO-ARM100-101-Parallel-Gripper

AbstractPhil

posted an update 1 day ago

Post

1369

I've... done it. This, with experts, achieves near 100% R1 retrieval accuracy on an adjacent - unseen by the fusion transformer - dataset with around 40k steps from the seen dataset. This means the language of the models are at least tested fused within the constraints, not just projected or estimated.
AbstractPhil/geolip-procrustes

I encourage EVERYONE who is curious to check my work. Check it, double check it, and triple check it.

These were aligned using COCO and then validated with Flickr. Entirely different datasets. The experts arbitrated and the alignment yielded the correct answers. Preliminary tests show that with almost no alignment requirement, the models can reach 100% R1 retrieval accuracy.

Not to be confused with validation accuracy for a classification model or a text encoder's text response, this allows multispectral communication between entirely different models for direct downstream consumption with almost no training for the chosen models.

I have a working procrustes experiment that learns adjacent manifolds within a reasonable spectrum and the speed is... well, 1 epoch with COCO using Bert-Large and DinoV2 that allows the models to align nearly perfectly. For some scales in the experiment it shows that the 3 set epochs aren't quite enough to align R1 to highest, while many align nearly immediately.

These two were an obvious pair to pick, 60% similarity and >90% spectral similarity.

The trainer transfers layers, learns embeddings, and more - all by sticking strictly to geometric boundaries and procrustes informational accumulation within a modulation model's constraints.

I have many experiments to run.

1 reply

umarbutler

posted an update 2 days ago

Post

1567

This awesome visualization by @abdurrahmanbutler tracks how reliant the High Court of Australia has been on UK precedents over time.

Back in the early 1900s, up to 70% of citations in High Court decisions were from the UK. Today, that number sits around 20%.

This change seems to have happened gradually as Australia gained more and more independence from the UK, culminating in the Australia Acts of 1986, where we see a nice bump in the proportion of Australian cases cited.

These insights would not be possible without our latest legal AI model, Kanon 2 Enricher, which we used to extract dates and citations from High Court decisions in isaacus/open-australian-legal-corpus and categorize citations by jurisdiction. You can learn about Kanon 2 Enricher here: https://isaacus.com/blog/kanon-2-enricher.

alvdansen

posted an update 2 days ago

Post

1147

Releasing Flimmer today — a video LoRA training toolkit for WAN 2.1 and 2.2 that covers the full pipeline from raw footage to trained checkpoint.
The standout feature is phased training: multi-stage runs where each phase has its own learning rate, epochs, and dataset, with the checkpoint carrying forward automatically. Built specifically with WAN 2.2's dual-expert MoE architecture in mind.

Data prep tools are standalone and output standard formats — they work with any trainer, not just Flimmer.

Early release, building in the open. LTX support coming next.

http://github.com/alvdansen/flimmer-trainer

Recently active users