Configuration Parsing Warning:Invalid JSON for config file config.json

Merlin: A Computed Tomography Vision–Language Foundation Model and Dataset

Merlin is a 3D VLM for computed tomography that leverages both structured electronic health records (EHR) and unstructured radiology reports for pretraining. The huggingface repository here provides the model weights and an example image file (Nature 2026).

[💻 Github] [📄 Nature Paper]

⚡️ Installation

To install Merlin, you can simply run:

pip install merlin-vlm

For an editable installation, use the following commands to clone and install this repository.

git clone https://github.com/StanfordMIMI/Merlin.git
cd Merlin
pip install -e .

For usage instructions, please visit the github repository.

📁 Project Structure:

.
├── README.md
├── i3_resnet_clinical_longformer_best_clip_04-02-2024_23-21-36_epoch_99.pt <Merlin weights>
├── image1.nii.gz <Sample Image>
├── resnet_gpt2_best_stanford_report_generation_average.pt <Merlin Radiology Report Generation Weights>
├── resnet_clinical_longformer_five_year_disease_prediction <Five Year Disease Prediction Weights>
├── nnUNetTrainerMerlin__nnUNetPlans__3d_fullres <nnUNet File>

📎 Citation

If you find this repository useful for your work, please cite the cite the Nature paper:

@article{blankemeier_kumar2026merlin,
  author = {Blankemeier, Louis and Kumar, Ashwin and Cohen, Joseph Paul and Liu, Jiaming and Liu, Longchao and Van Veen, Dave and Gardezi, Syed Jamal Safdar and Yu, Hongkun and Paschali, Magdalini and Chen, Zhihong and Delbrouck, Jean-Benoit and Reis, Eduardo and Holland, Robbie and Truyts, Cesar and Bluethgen, Christian and Wu, Yufu and Lian, Long and Jensen, Malte Engmann Kjeldskov and Ostmeier, Sophie and Varma, Maya and Valanarasu, Jeya Maria Jose and Fang, Zhongnan and Huo, Zepeng and Nabulsi, Zaid and Ardila, Diego and Weng, Wei-Hung and Amaro Junior, Edson and Ahuja, Neera and Fries, Jason and Shah, Nigam H. and Zaharchuk, Greg and Willis, Marc and Yala, Adam and Johnston, Andrew and Boutin, Robert D. and Wentland, Andrew and Langlotz, Curtis P. and Hom, Jason and Gatidis, Sergios and Chaudhari, Akshay S.},
  title   = {Merlin: a computed tomography vision-language foundation model and dataset},
  journal = {Nature},
  year    = {2026},
  doi     = {10.1038/s41586-026-10181-8},
  url     = {https://doi.org/10.1038/s41586-026-10181-8}
}

Downloads last month: 2,363