Spirit-v1.5 (Base)

Spirit-v1.5 is a Vision-Language-Action (VLA) model.

Introduction

Spirit-v1.5 is built on:

  • A VLM backbone: Qwen/Qwen3-VL-4B-Instruct
  • A DiT (Diffusion Transformer) action head
  • A policy inference API

For implementation details and usage examples, please refer to the GitHub repository:

  • GitHub: https://github.com/Spirit-AI-Team/spirit-v1.5

For the model announcement / overview, see the blog post:

  • Blog: https://www.spirit-ai.com/en/blog/spirit-v1-5

Requirements

Tested in a GPU environment. Recommended Python: 3.11+.

Pinned dependency versions from this repo:

  • torch==2.9.1
  • torchvision==0.24.1
  • transformers==4.57.3
  • diffusers==0.36.0
  • safetensors==0.7.0
  • numpy==2.4.0
  • pillow==12.1.0
  • scipy==1.16.3

Quickstart

This model card is hosted separately on Hugging Face. For the latest runnable commands and end-to-end usage, please follow:

  • GitHub repo README: https://github.com/Spirit-AI-Team/spirit-v1.5

Citation

If you use this model / code in your work, please cite and link to:

  • GitHub: https://github.com/Spirit-AI-Team/spirit-v1.5
  • Blog: https://www.spirit-ai.com/en/blog/spirit-v1-5

Bibtex

@article{spiritai2026spiritv15,
  author = {Spirit AI Team},
  title = {Spirit-v1.5: Clean Data Is the Enemy of Great Robot Foundation Models},
  journal = {Spirit AI Blog},
  year = {2026},
  note = {https://www.spirit-ai.com/en/blog/spirit-v1-5},
}
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Spirit-AI-robotics/Spirit-v1.5

Finetunes
1 model