Andrew Zhao
andrewzh
AI & ML interests
Reinforcement Learning, Agents
Recent Activity
upvoted a paper about 2 hours ago
HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning upvoted a paper about 2 months ago
Good SFT Optimizes for SFT, Better SFT Prepares for Reinforcement Learning