RLinf-USER: A Unified and Extensible System for Real-World Online Policy Learning in Embodied AI
Abstract
USER is a unified systems framework that enables scalable, asynchronous online policy learning in physical robots by treating them as first-class hardware resources and supporting diverse learning paradigms including VLA models.
Online policy learning directly in the physical world is a promising yet challenging direction for embodied intelligence. Unlike simulation, real-world systems cannot be arbitrarily accelerated, cheaply reset, or massively replicated, which makes scalable data collection, heterogeneous deployment, and long-horizon effective training difficult. These challenges suggest that real-world policy learning is not only an algorithmic issue but fundamentally a systems problem. We present USER, a Unified and extensible SystEm for Real-world online policy learning. USER treats physical robots as first-class hardware resources alongside GPUs through a unified hardware abstraction layer, enabling automatic discovery, management, and scheduling of heterogeneous robots. To address cloud-edge communication, USER introduces an adaptive communication plane with tunneling-based networking, distributed data channels for traffic localization, and streaming-multiprocessor-aware weight synchronization to regulate GPU-side overhead. On top of this infrastructure, USER organizes learning as a fully asynchronous framework with a persistent, cache-aware buffer, enabling efficient long-horizon experiments with robust crash recovery and reuse of historical data. In addition, USER provides extensible abstractions for rewards, algorithms, and policies, supporting online imitation or reinforcement learning of CNN/MLP, generative policies, and large vision-language-action (VLA) models within a unified pipeline. Results in both simulation and the real world show that USER enables multi-robot coordination, heterogeneous manipulators, edge-cloud collaboration with large models, and long-running asynchronous training, offering a unified and extensible systems foundation for real-world online policy learning.
Community
We present USER, a Unified and extensible SystEm for Real-world online policy learning. USER treats physical robots as first-class hardware resources alongside GPUs through a unified hardware abstraction layer, enabling automatic discovery, management, and scheduling of heterogeneous robots.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- RollArt: Scaling Agentic RL Training via Disaggregated Infrastructure (2025)
- RL-VLA$^3$: Reinforcement Learning VLA Accelerating via Full Asynchronism (2026)
- Nimbus: A Unified Embodied Synthetic Data Generation Framework (2026)
- Rollout-Training Co-Design for Efficient LLM-Based Multi-Agent Reinforcement Learning (2026)
- ECHO-2: A Large-Scale Distributed Rollout Framework for Cost-Efficient Reinforcement Learning (2026)
- OpenTinker: Separating Concerns in Agentic Reinforcement Learning (2026)
- When RL Meets Adaptive Speculative Training: A Unified Training-Serving System (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper