Spirit-v1.5 (Base)

Spirit-v1.5 is a Vision-Language-Action (VLA) model.

Introduction

Spirit-v1.5 is built on:

A VLM backbone: Qwen/Qwen3-VL-4B-Instruct
A DiT (Diffusion Transformer) action head
A policy inference API

For implementation details and usage examples, please refer to the GitHub repository:

GitHub: https://github.com/Spirit-AI-Team/spirit-v1.5

For the model announcement / overview, see the blog post:

Blog: https://www.spirit-ai.com/en/blog/spirit-v1-5

Requirements

Tested in a GPU environment. Recommended Python: 3.11+.

Pinned dependency versions from this repo:

torch==2.9.1
torchvision==0.24.1
transformers==4.57.3
diffusers==0.36.0
safetensors==0.7.0
numpy==2.4.0
pillow==12.1.0
scipy==1.16.3

Quickstart

This model card is hosted separately on Hugging Face. For the latest runnable commands and end-to-end usage, please follow:

GitHub repo README: https://github.com/Spirit-AI-Team/spirit-v1.5

Citation

If you use this model / code in your work, please cite and link to:

GitHub: https://github.com/Spirit-AI-Team/spirit-v1.5
Blog: https://www.spirit-ai.com/en/blog/spirit-v1-5

Bibtex

@article{spiritai2026spiritv15,
  author = {Spirit AI Team},
  title = {Spirit-v1.5: Clean Data Is the Enemy of Great Robot Foundation Models},
  journal = {Spirit AI Blog},
  year = {2026},
  note = {https://www.spirit-ai.com/en/blog/spirit-v1-5},
}

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Spirit-AI-robotics/Spirit-v1.5

Finetunes

1 model