caohaoyu's picture

6 8

caohaoyu

rechy

·

AI & ML interests

None yet

Recent Activity

authored a paper about 7 hours ago

Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration

authored a paper about 7 hours ago

Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models

authored a paper about 7 hours ago

Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction

View all activity

Organizations

None yet

authored 12 papers about 7 hours ago

Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration

Paper • 2309.01131 • Published Sep 3, 2023 • 1

Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models

Paper • 2402.19014 • Published Feb 29, 2024

Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction

Paper • 2406.12707 • Published Jun 18, 2024

Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy

Paper • 2502.05177 • Published Feb 7, 2025 • 2

VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model

Paper • 2505.03739 • Published May 6, 2025 • 9

VITA-VLA: Efficiently Teaching Vision-Language Models to Act via Action Expert Distillation

Paper • 2510.09607 • Published Oct 10, 2025 • 2

VITA-E: Natural Embodied Interaction with Concurrent Seeing, Hearing, Speaking, and Acting

Paper • 2510.21817 • Published Oct 21, 2025 • 42

Input Domain Aware MoE: Decoupling Routing Decisions from Task Optimization in Mixture of Experts

Paper • 2510.16448 • Published Oct 18, 2025

TACO: Think-Answer Consistency for Optimized Long-Chain Reasoning and Efficient Data Learning via Reinforcement Learning in LVLMs

Paper • 2505.20777 • Published May 27, 2025

Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision

Paper • 2601.19798 • Published 3 days ago • 38

BASIC: Boosting Visual Alignment with Intrinsic Refined Embeddings in Multimodal Large Language Models

Paper • 2508.06895 • Published Aug 9, 2025

Youtu-Parsing: Perception, Structuring and Recognition via High-Parallelism Decoding

Paper • 2601.20430 • Published 2 days ago • 14

upvoted 2 papers about 10 hours ago

Youtu-Parsing: Perception, Structuring and Recognition via High-Parallelism Decoding

Paper • 2601.20430 • Published 2 days ago • 14

Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision

Paper • 2601.19798 • Published 3 days ago • 38

liked 5 models 3 days ago

tencent/Youtu-Embedding

Sentence Similarity • 2B • Updated Dec 20, 2025 • 963 • 67

tencent/Youtu-LLM-2B

Text Generation • 2B • Updated 1 day ago • 7.29k • 221

tencent/Youtu-LLM-2B-GGUF

Text Generation • 2B • Updated 25 days ago • 2.49k • 23

tencent/Youtu-VL-4B-Instruct-GGUF

Image-Text-to-Text • 5B • Updated 1 day ago • 587 • 36

tencent/Youtu-VL-4B-Instruct

Image-Text-to-Text • 5B • Updated 1 day ago • 155 • 85

upvoted a collection 3 days ago

Youtu

12 items • Updated 1 day ago • 21