Hulu-Med: A Transparent Generalist Model towards Holistic Medical Vision-Language Understanding
Paper
• 2510.08668 • Published
• 9
AutoModelForCausalLM.from_pretrained - the weights will be automatically downloaded.
For users in regions with limited access, you can set the HF mirror environment variable to ensure reliable downloads:export HF_ENDPOINT=https://hf-mirror.com
Hulu-Med is a transparent medical vision-language model that unifies understanding across diverse modalities including medical text, 2D/3D images, and videos. Built with a focus on transparency and accessibility, Hulu-Med achieves state-of-the-art performance on 30 medical benchmarks while being trained entirely on public data.
Our training corpus encompasses:
Note: As a MoE-based model, Hulu-30A3/235A22 is recommended to be served via vLLM or SGLang for optimal performance and efficiency.
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 PYTHONPATH=./Swift-HuluMed/ swift deploy \
--model Hulu-30A3 \
--infer_backend vllm \
--vllm_tensor_parallel_size 8 \
--vllm_engine_kwargs '{"data_parallel_size": 1, "enable_chunked_prefill": true, "enable_multimodal_encoder_data_parallel": false}' \
--vllm_max_num_seqs 512 \
--vllm_enable_expert_parallel \
--vllm_max_model_len 75538 \
--vllm_gpu_memory_utilization 0.85 \
--model_type qwen3_vl_moe \
--port 8000 \
--served_model_name hulu
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 PYTHONPATH=./Swift-HuluMed/ swift deploy \
--model Hulu-30A3 \
--infer_backend sglang \
--max_new_tokens 128000 \
--sglang_context_length 128000 \
--sglang_tp_size 8 \
--model_type qwen3_moe_vl \
--port 8000 \
--served_model_name hulu
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")
response = client.chat.completions.create(
model="hulu",
messages=[{"role": "user", "content": "Hello, I have a headache, what should I do?"}],
max_tokens=1024,
temperature=0,
)
print(response.choices[0].message.content)
import base64
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")
with open("./demo/demo.jpg", "rb") as f:
image_data = base64.b64encode(f.read()).decode("utf-8")
response = client.chat.completions.create(
model="hulu",
messages=[{
"role": "user",
"content": [
{"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_data}"}},
{"type": "text", "text": "Generate a medical report for this image."},
],
}],
max_tokens=1024,
temperature=0,
)
print(response.choices[0].message.content)
If you find Hulu-Med useful in your research, please cite:
@misc{jiang2025hulumedtransparentgeneralistmodel,
title={Hulu-Med: A Transparent Generalist Model towards Holistic Medical Vision-Language Understanding},
author={Songtao Jiang and Yuan Wang and Sibo Song and Tianxiang Hu and Chenyi Zhou and Bin Pu and Yan Zhang and Zhibo Yang and Yang Feng and Joey Tianyi Zhou and Jin Hao and Zijian Chen and Ruijia Wu and Tao Tang and Junhui Lv and Hongxia Xu and Hongwei Wang and Jun Xiao and Bin Feng and Fudong Zhu and Kenli Li and Weidi Xie and Jimeng Sun and Jian Wu and Zuozhu Liu},
year={2025},
eprint={2510.08668},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2510.08668},
}
This project is released under the Apache 2.0 License.