WorldVLN: Autoregressive World Action Model for Aerial Vision-Language Navigation
Paper • 2605.15964 • Published
This repository contains the model weights introduced in the paper: [WorldVLN: Autoregressive World Action Model for Aerial Vision-Language Navigation].
It includes the weights for the world model backbone and the action decoder.
For more details about the model and its implementation, please refer to the GitHub repository: https://github.com/EmbodiedCity/WorldVLN.code
If this work has contributed to your research, welcome to cite it:
@misc{zhao2026worldvln,
title={WorldVLN: Autoregressive World Action Model for Aerial Vision-Language Navigation},
author={Baining Zhao and Jiacheng Xu and Weicheng Feng and Xin Zhang and Zhaolu Wang and Haoyang Wang and Shilong Ji and Ziyou Wang and Jianjie Fang and Zhiheng Zheng and Weichen Zhang and Yu Shang and Wei Wu and Chen Gao and Xinlei Chen and Yong Li},
year={2026},
eprint={2605.15964},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2605.15964},
}