Spirit-v1.5 (Base)
Spirit-v1.5 is a Vision-Language-Action (VLA) model.
Introduction
Spirit-v1.5 is built on:
- A VLM backbone:
Qwen/Qwen3-VL-4B-Instruct - A DiT (Diffusion Transformer) action head
- A policy inference API
For implementation details and usage examples, please refer to the GitHub repository:
- GitHub:
https://github.com/Spirit-AI-Team/spirit-v1.5
For the model announcement / overview, see the blog post:
- Blog:
https://www.spirit-ai.com/en/blog/spirit-v1-5
Requirements
Tested in a GPU environment. Recommended Python: 3.11+.
Pinned dependency versions from this repo:
torch==2.9.1torchvision==0.24.1transformers==4.57.3diffusers==0.36.0safetensors==0.7.0numpy==2.4.0pillow==12.1.0scipy==1.16.3
Quickstart
This model card is hosted separately on Hugging Face. For the latest runnable commands and end-to-end usage, please follow:
- GitHub repo README:
https://github.com/Spirit-AI-Team/spirit-v1.5
Citation
If you use this model / code in your work, please cite and link to:
- GitHub:
https://github.com/Spirit-AI-Team/spirit-v1.5 - Blog:
https://www.spirit-ai.com/en/blog/spirit-v1-5
Bibtex
@article{spiritai2026spiritv15,
author = {Spirit AI Team},
title = {Spirit-v1.5: Clean Data Is the Enemy of Great Robot Foundation Models},
journal = {Spirit AI Blog},
year = {2026},
note = {https://www.spirit-ai.com/en/blog/spirit-v1-5},
}
- Downloads last month
- -
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support