arxiv:2602.14234

REDSearcher: A Scalable and Cost-Efficient Framework for Long-Horizon Search Agents

Published on Feb 15

· Submitted by

Xiao Wang on Feb 17

#3 Paper of the day

· Xiaohongshu

Upvote

Authors:

Chenxiao Zhao ,

Abstract

REDSearcher presents a unified framework for optimizing search agents through improved task synthesis, tool-augmented queries, midtraining capability enhancement, and simulated environments to address challenges in long-horizon search tasks.

AI-generated summary

Large language models are transitioning from generalpurpose knowledge engines to realworld problem solvers, yet optimizing them for deep search tasks remains challenging. The central bottleneck lies in the extreme sparsity of highquality search trajectories and reward signals, arising from the difficulty of scalable longhorizon task construction and the high cost of interactionheavy rollouts involving external tool calls. To address these challenges, we propose REDSearcher, a unified framework that codesigns complex task synthesis, midtraining, and posttraining for scalable searchagent optimization. Specifically, REDSearcher introduces the following improvements: (1) We frame task synthesis as a dualconstrained optimization, where task difficulty is precisely governed by graph topology and evidence dispersion, allowing scalable generation of complex, highquality tasks. (2) We introduce toolaugmented queries to encourage proactive tool use rather than passive recall.(3) During midtraining, we strengthen core atomic capabilities knowledge, planning, and function calling substantially reducing the cost of collecting highquality trajectories for downstream training. (4) We build a local simulated environment that enables rapid, lowcost algorithmic iteration for reinforcement learning experiments. Across both textonly and multimodal searchagent benchmarks, our approach achieves stateoftheart performance. To facilitate future research on longhorizon search agents, we will release 10K highquality complex text search trajectories, 5K multimodal trajectories and 1K text RL query set, and together with code and model checkpoints.

View arXiv page View PDF Project page GitHub 9 Add to collection

Community

CherryDurian

Paper submitter about 18 hours ago

Large language models are transitioning from general purpose knowledge engines to real world problem solvers, yet optimizing them for deep search tasks remains challenging. The central bottleneck lies in the extreme sparsity of high quality search trajectories and reward signals, arising from the difficulty of scalable long horizon task construction and the high cost of interaction heavy rollouts involving external tool calls. To address these challenges, we propose REDSearcher, a unified framework that co-designs complex task synthesis, mid-training, and post-training for scalable search agent optimization. Specifically, REDSearcher introduces the following improvements: (1) We frame task synthesis as a dual-constrained optimization, where task difficulty is precisely governed by graph topology and evidence dispersion, allowing scalable generation of complex, high quality tasks. (2) We introduce tool augmented queries to encourage proactive tool use rather than passive recall.(3) During mid-training, we strengthen core atomic capabilities knowledge, planning, and function calling substantially reducing the cost of collecting high quality trajectories for downstream training. (4) We build a local simulated environment that enables rapid, low cost algorithmic iteration for reinforcement learning experiments. Across both text-only and multimodal search agent benchmarks, our approach achieves state-of-the-art performance. To facilitate future research on long horizon search agents, we will release 10K high quality complex text search trajectories, 5K multimodal trajectories and 1K text RL query set, and together with code and model checkpoints.