Now Live: The Reubencf/Nano_Banana_Editor now includes 10 free requests/day! 🍌 I'm personally sponsoring these credits to help make open AI accessible to all. (Note: Limits are subject to change based on funding).
Enjoy !
reacted to dhruv3006's
post with 👀about 6 hours ago
The problem : Hardcoded URLs, tokens, and IDs make API workflows brittle and painful to maintain. What devs do today Duplicate values across files or manually swap configs for dev, staging, and prod - easy to break, hard to scale. Why Voiden : - Voiden Variables let you define once and reuse everywhere. - Switch environments easily, keep secrets out of request files, and reuse dynamic values across requests.
I submitted a "AgencyBench: Benchmarking the Frontiers of Autonomous Agents in 1M-Token Real-World Contexts" Paper by @weizhihao1KeyuLi Junhao shi @dqwangDequan Wang @YangXiao-nlpYang Xiao Mohan Jiang @Sunshine279Jie Sun Yunze Wu Shijie Xia Xiaojie Cai Tianze Xu Weiye Si Wenjie Li Pengfei Liu From
A potentially another direction for Benchmarking the Frontiers of Autonomous Agents in 2026
Some of the observations founded are :-
-- Long-horizon tasks remain challenging : Even frontier models struggle with sustained reasoning over real world tasks that require 1M tokens and 90 tool calls, indicating limits in long context autonomy.
-- Proprietary models outperform open source models: Closed source models achieve a higher average score (48.4%) than open source counterparts (32.1%), revealing a persistent performance gap on complex agentic tasks.
-- Feedback driven self correction varies widely: Models like GPT 5.2 and Claude show strong gains from iterative feedback, while others (e.g. DeepSeek V3.2) exhibit minimal or no improvement after feedback.
-- Efficiency trade offs are significant: High performing models often consume far more tokens and time, some models (e.g. Grok 4.1 Fast) are more token efficient despite lower absolute scores.
-- Agentic scaffolds strongly influence performance: Models tend to perform best within their native or optimized ecosystems, highlighting that agent performance depends on tight coupling between the model and its scaffold not the model alone.
DeepMind just released PACEvolve (Progress-Aware Consistent Evolution), a massive overhaul of the AlphaEvolve framework. It solves the critical issues of "Context Pollution" and "Mode Collapse" that have historically crippled evolutionary coding agents.
But there was no public implementation. So I built one.
Introducing OpenPACEvolve: A fully open-source, production-grade implementation of the PACEvolve framework.
🛠 I engineered this framework solo, but I wasn't working alone. I orchestrated a custom coding agents powered by Claude Opus 4.5 as Engineer and Gemini Pro 3 Preview ensuring fiedelity and quallty.
By leveraging these SOTA models, I was able to translate complex theoretical research into functional, modular Python architecture in record time. This is what the future of AI engineering looks like: Human architectural oversight + AI velocity.
🧠 What OpenPACEvolve Solves: Unlike standard agents that get "stuck" in loops, this framework implements the paper's full recipe for long-horizon stability: ✅ Hierarchical Context Management (HCM): Bi-level pruning to keep the agent's memory clean. ✅ Momentum-Based Backtracking (MBB): Uses "power-law backtracking" to detect stagnation and force pivots. ✅ Self-Adaptive Crossover: Intelligent code-sharing between parallel "islands."
👨💻 This project is more than a repo; it's a demonstration of rapid research-to-production cycles using next-gen AI workflows.
Our engineer Alan from https://robonine.com/ (Educational Robotics) integrated Feetech STS3250 and STS3215 servo motors into the prototype and completed the first test run of a 6-DOF semi-SCARA manipulator.
During motion, the structure demonstrates high stiffness with no visible backlash or mechanical play. The kinematic chain remains stable throughout the test trajectory, confirming the rigidity of the mechanical design and joint assembly.
The next stage includes full assembly with all actuators operating in backlash compensation mode, followed by quantitative measurement of positioning accuracy and repeatability.
GLM-4.7-Flash is fast, good and cheap. 3,074 tokens/sec peak at 200k tokens context window on my desktop PC. Works with Claude Code and opencode for hours. No errors, drop-in replacement of the Anthropic cloud AI. MIT licensed, open weights, free for commercial use and modifications. Supports speculative decoding using MTP, which is highly effective in mitigating latency. Great for on device AI coding as AWQ 4bit at 18.5 GB. Hybrid inference on a single consumer GPU + CPU RAM.
Summary: Structured Intelligence systems don’t just *think*—they *change the world* (payments, bookings, city actuators, learning/medical records). In distributed reality, partial failures and retries are normal, so “do it once” is a myth.
This article is a practical cookbook for making effectful operations *retry-safe, reversible (when possible), and auditable*, using *RML levels (1→3)*, *Sagas + compensators*, and “single storyline” effect traces—then measuring quality via *RBL / RIR / SCI*.
> A compensator is *another effect*, not a magical “undo”.
---
Why It Matters: • Prevents double-apply / half-committed states by defaulting to *idempotency + durable traces* • Makes rollback *engineering-real*: compensators must be *idempotent*, monotone toward safety, and bounded to a durable terminal/pending state • Handles “can’t undo” honestly: model *partial reversibility* + remaining risk + follow-up tasks • Turns failure handling into metrics you can operate: *RBL (rollback latency), RIR (rollback integrity), SCI (structural inconsistencies)*
📢 The Announcement Subject: XenArcAI is now Modotte – A New Chapter Begins! 🚀
Hello everyone,
We are thrilled to announce that XenArcAI is officially rebranding to Modotte!
Since our journey began, we’ve been committed to pushing the boundaries of AI through open-source innovation, research, and high-quality datasets. As we continue to evolve, we wanted a name that better represents our vision for a modern, interconnected future in the tech space.
What is changing?
The Name: Moving forward, all our projects, models, and community interactions will happen under the Modotte banner.
The Look: You’ll see our new logo and a fresh color palette appearing across our platforms.
What is staying the same?
The Core Team: It’s still the same people behind the scenes, including our founder, Parvesh Rawal.
Our Mission: We remain dedicated to releasing state-of-the-art open-source models and datasets.
Our Continuity: All existing models, datasets, and projects will remain exactly as they are—just with a new home.
This isn’t just a change in appearance; it’s a commitment to our next chapter of growth and discovery. We are so grateful for your ongoing support as we step into this new era.
Think you know which AI papers go viral? Test your instincts! I built a little game where you try to guess the popularity of AI research papers from the Hugging Face Daily Papers feed.
How it works: You'll see two papers side by side—read the titles, check the abstracts, and pick which one you think got more upvotes from the HF community.
It's a great way to discover trending AI research while having fun. Tests your intuition about what the ML community finds interesting.
Expanding beyond the modern code series, this release presents a massive historical snapshot from the Google Code Archive. This dataset captures the open-source landscape from 2006 to 2016, offering a unique time capsule of software development patterns during the era before GitHub's dominance.
Key Stats:
- 65,825,565 files from 488,618 repositories - 47 GB compressed Parquet storage - 454 programming languages (Heavily featuring Java, PHP, and C++) - Extensive quality filtering (excluding vendor code and build artifacts) - Rich historical metadata: original repo names, file paths, and era-specific licenses
This is one of those releases that I'm most interested in getting feedback on. Would you like to see more old code datasets?
Finetuned from the fantastic Olmo3.1 32B architecture by AllenAI, Mox-Small-1 was trained using the same datasets and methodology as Mox-Tiny-1, making this model our second addition to the Mox-1 family of models.
Mox-1 is designed to prioritize clarity, honesty, and genuine utility over blind agreement. These models are perfect for when you want to be challenged in a constructive, helpful way.
By utilizing Olmo3.1 32B's architecture, Mox-Small-1 brings greater conversational depth and reasoning quality to the Mox-1 model family. Check it out!
📽️ New NVIDIA paper: Motion Attribution for Video Generation 📽️
We propose MOTIVE, a method for taking query video clips and identifying which training data will improve or degrade performance after finetuning, enabling sophisticated data curation and beyond!
This project was led by the great work of Xindi(Cindy) Wu, along with Despoina Paschalidou, Jun Gao, Antonio Torralba, Laura Leal-Taixé, Olga Russakovsky, and Sanja Fidler.
Can we measure how AI interaction reshapes human cognition? We built two semantic association instruments that pit humans against Claude Haiku—testing divergent thinking and communicability under constraint. Try the instruments and contribute to the dataset: https://instruments.phronos.org/ins-001/
DeepSeek R1 dropped one year ago 🐳 and a lot has changed.
With @irenesolaiman , we’re launching a blog series about how that moment reshaped AI + open source in 2025, starting with strategic shifts and the explosion of new open models in China!