ObjEmbed
Collection
ObjEmbed: Towards Universal Multimodal Object Embeddings
•
3 items
•
Updated
ObjEmbed is a multimodal embedding model designed to align specific image regions (objects) with textual descriptions. Unlike global embedding models, ObjEmbed decomposes an image into multiple regional embeddings along with global embeddings, supporting tasks such as visual grounding, local image retrieval, and global image retrieval.
If you find ObjEmbed helpful for your research, please consider citing:
@article{fu2026objembed,
title={ObjEmbed: Towards Universal Multimodal Object Embeddings},
author={Fu, Shenghao and Su, Yukun and Rao, Fengyun and LYU, Jing and Xie, Xiaohua and Zheng, Wei-Shi},
journal={arXiv preprint arXiv:2602.01753},
year={2026}
}