Running on CPU Upgrade 140 The Synthetic Data Playbook: Generating Trillions of the Finest Tokens 📝 140 Explore synthetic data experiments with an interactive bookshelf
view article Article GGML and llama.cpp join HF to ensure the long-term progress of Local AI +4 19 days ago • 480
jina-embeddings-v5-text Collection Our 5th-gen embeddings: two lightweight multilingual models with SOTA performance in retrieval, matching, clustering, and classification. • 29 items • Updated 12 days ago • 35
PersonaPlex: Voice and Role Control for Full Duplex Conversational Speech Models Paper • 2602.06053 • Published Jan 14 • 7
view article Article LateOn-Code & ColGrep: LightOn unveils state-of-the-art code retrieval models and code search tooling 27 days ago • 49
view article Article Architectural Choices in China's Open-Source AI Ecosystem: Building Beyond DeepSeek Jan 27 • 45