π€ Open Computer Agent v2.0
An enhanced universal computer-use agent built on smolagents, E2B Desktop, and Playwright.
What's New in v2.0
| Feature | Description |
|---|---|
| π§ Hierarchical Planner | Breaks goals into subtasks before execution using a cheap text model |
| π Playwright MCP | Semantic browser control (click by text/role, extract tables/links, evaluate JS) |
| π― Multi-Model Router | Auto-selects the cheapest capable model (fast vision β powerful vision β fast text β powerful text) |
| π§© Set-of-Marks Vision | Overlays numbered bounding boxes on UI elements for coordinate-free interaction |
| ποΈ Long-Term Memory | ChromaDB vector store retrieves similar past tasks and proven strategies |
| π Verifier Agent | Checks subtask completion and triggers recovery loops |
| π Human-in-the-Loop | Pauses on sensitive actions (payments, emails, deletes) for user approval |
| ποΈ Voice I/O | Speak tasks and hear responses via Whisper STT + Kokoro TTS |
| π° Cost Dashboard | Real-time $/task, token usage, and latency tracking |
| πΉ Session Recording | Saves every step as replayable macros with GIF/MP4 export potential |
| π§ͺ Enhanced Eval | Built-in benchmark suite with LLM-as-a-Judge grading and A/B testing |
Architecture
User Input (Text / Voice / File)
|
v
[Intelligence Router] ----> Planner (JSON DAG)
|
v
[Memory Retrieval] (ChromaDB)
|
v
[Plan Executor]
|
+---> [Browser Sub-Agent] (Playwright MCP)
+---> [Desktop Sub-Agent] (E2B + SoM Vision)
+---> [Coder Sub-Agent] (Code Interpreter)
+---> [HF Hub Sub-Agent] (Search / Upload)
|
v
[Verifier] -> Retry / Alternative / Continue
|
v
[Macro Saver] + Cost Report + Session Recording
Quick Start
- Set your HF_TOKEN and E2B_API_KEY in the Space Secrets.
- Type a task (or speak it) and hit π Let's go!.
- Watch the agent plan, execute, verify, and report costs.
Sensitive Actions
By default, the agent pauses before:
- Payments, purchases, subscriptions
- Sending emails/messages/posts
- Deleting files or uninstalling software
- Password/credit-card fields
Enable Auto-approve all actions in Advanced Options to disable HITL.
Cost Budget
Default budget is $2.00 USD per session. The router automatically downgrades to cheaper models as the budget is consumed.
Benchmarks
Run the built-in eval suite:
from eval_harness import EvaluationHarness
# See eval_harness.py for usage
Credits
- smolagents by Hugging Face
- E2B for secure sandboxed desktops
- Playwright for browser automation
- Qwen2.5-VL for vision reasoning
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support