Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
de-RodrigoΒ 
posted an update 3 days ago
Post
1609
We are happy to share the VERSE Methodology paper via arXiv! πŸ“ƒπŸ’«

VERSE: Visual Embedding Reduction and Space Exploration. Clustering-Guided Insights for Training Data Enhancement in Visually-Rich Document Understanding (2601.05125)

We usually train VLMs on visual synthetic data that we (as humans) label as photorealistic. We argue that this is an anthropocentric perspective imposed to a model that might not synthetize visual information as we do. VERSE helps to visualize latent space and overlay visual features to detect poor-performance regions and take action to include better-suited training sets to boost model performance.


Resources:

- Code: https://github.com/nachoDRT/VrDU-Doctor
- Hugging Face Space: de-Rodrigo/Embeddings

Want to collaborate? Do you have any feedback? 🧐

PD: As always, we are grateful to Hugging Face πŸ€— for providing the fantastic tools and resources we find on the platform!
In this post