Proxy3D: Efficient 3D Representations for Vision-Language Models via Semantic Clustering and Alignment Paper • 2605.08064 • Published 9 days ago • 1