Papers
arxiv:2602.18996

Learning Cross-View Object Correspondence via Cycle-Consistent Mask Prediction

Published on Feb 22
· Submitted by
Shannan Yan
on Feb 24
Authors:
,
,
,
,
,
,
,
,

Abstract

A conditional binary segmentation framework with cycle-consistency training enables robust object correspondence across egocentric and exocentric viewpoints without ground-truth annotations.

AI-generated summary

We study the task of establishing object-level visual correspondence across different viewpoints in videos, focusing on the challenging egocentric-to-exocentric and exocentric-to-egocentric scenarios. We propose a simple yet effective framework based on conditional binary segmentation, where an object query mask is encoded into a latent representation to guide the localization of the corresponding object in a target video. To encourage robust, view-invariant representations, we introduce a cycle-consistency training objective: the predicted mask in the target view is projected back to the source view to reconstruct the original query mask. This bidirectional constraint provides a strong self-supervisory signal without requiring ground-truth annotations and enables test-time training (TTT) at inference. Experiments on the Ego-Exo4D and HANDAL-X benchmarks demonstrate the effectiveness of our optimization objective and TTT strategy, achieving state-of-the-art performance. The code is available at https://github.com/shannany0606/CCMP.

Community

Paper author Paper submitter

The paper has been accepted to CVPR 2026 with a high review score of 554. Our approach is intentionally simple and effective. We use a straightforward pipeline, and show that such a simple design can already achieve strong performance and generalization across benchmarks. We believe that presenting a simple and effective solution to a difficult problem is valuable and useful for the community.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2602.18996 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2602.18996 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2602.18996 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.