In a Training Loop 🔄

23 38 37

Rui Yang PRO

Ray2333

https://yangrui2015.github.io

YangRui2015

AI & ML interests

Deep Reinforcement Learning

Recent Activity

updated a model about 13 hours ago

OpenWebRL/OpenWebRL-4B-SFT

published a model about 13 hours ago

OpenWebRL/OpenWebRL-4B-SFT

updated a dataset about 13 hours ago

Ray2333/Judge_data_plus

View all activity

Organizations

commented 2 papers 7 months ago

MIRO: MultI-Reward cOnditioned pretraining improves T2I quality and efficiency

Paper • 2510.25897 • Published Oct 29, 2025 • 17 •

ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning

Paper • 2510.12693 • Published Oct 14, 2025 • 28 •

commented a paper 12 months ago

Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces

Paper • 2506.00123 • Published May 30, 2025 • 35 •

New activity in microsoft/GUI-Actor-Verifier-2B 12 months ago

Update README.md

#1 opened 12 months ago by

Ray2333

commented a paper 12 months ago

MiCRo: Mixture Modeling and Context-aware Routing for Personalized Preference Learning

Paper • 2505.24846 • Published May 30, 2025 • 15 •

commented a paper about 1 year ago

Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL

Paper • 2505.02391 • Published May 5, 2025 • 25 •

New activity in Ray2333/GRM-Llama3.2-3B-rewardmodel-ft about 1 year ago

Bug in readme implementation

#3 opened about 1 year ago by

jvelja

New activity in microsoft/Magma-8B about 1 year ago

generation_args in the example

❤️ 2

#10 opened about 1 year ago by

Ray2333

New activity in EmbodiedBench/EB-Manipulation over 1 year ago

Add dataset card

#1 opened over 1 year ago by

nielsr

New activity in Ray2333/Gemma-2B-rewardmodel-baseline over 1 year ago

trained dataset and fine-tuned method

#1 opened over 1 year ago by

glgjss960

commented 2 papers over 1 year ago

Rethinking Diverse Human Preference Learning through Principal Component Analysis

Paper • 2502.13131 • Published Feb 18, 2025 • 37 •

EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents

Paper • 2502.09560 • Published Feb 13, 2025 • 35 •

New activity in Ray2333/GRM-Llama3.2-3B-rewardmodel-ft over 1 year ago

Update default tokenization behavior to "longest" in README

#2 opened over 1 year ago by

MichaelR207

Model Size

#1 opened over 1 year ago by

szhang120

commented a paper over 1 year ago

DynaMath: A Dynamic Visual Benchmark for Evaluating Mathematical Reasoning Robustness of Vision Language Models

Paper • 2411.00836 • Published Oct 29, 2024 • 15 •

New activity in Ray2333/GRM-llama3-8B-sftreg over 1 year ago

Adding `safetensors` variant of this model

#3 opened over 1 year ago by

SFconvertbot

New activity in Ray2333/GRM-llama3-8B-sftreg almost 2 years ago

Abnormally Large Memory Footprint?

#2 opened almost 2 years ago by

RylanSchaeffer

Some weights of the model checkpoint at Ray2333/GRM-llama3-8B-sftreg were not used when initializing

#1 opened almost 2 years ago by

RylanSchaeffer

New activity in Ray2333/gpt2-large-harmless-reward_model almost 2 years ago

Load failed:There is no "pytorch_model.bin", how to load the model?

#3 opened almost 2 years ago by

Hanlard

a bug when loading model

#2 opened almost 2 years ago by

ssmmzz

Rui Yang PRO

AI & ML interests

Recent Activity

Organizations

Ray2333's activity

Update README.md

Bug in readme implementation

generation_args in the example

Add dataset card

trained dataset and fine-tuned method

Update default tokenization behavior to "longest" in README

Model Size

Adding `safetensors` variant of this model

Abnormally Large Memory Footprint?

Some weights of the model checkpoint at Ray2333/GRM-llama3-8B-sftreg were not used when initializing

Load failed:There is no "pytorch_model.bin", how to load the model?

a bug when loading model