Papers
arxiv:2602.06053

PersonaPlex: Voice and Role Control for Full Duplex Conversational Speech Models

Published on Jan 14
Authors:
,
,
,
,
,
,
,

Abstract

PersonaPlex enables role-conditioned, voice-cloned duplex speech interactions through hybrid prompting and large-scale synthetic training data.

AI-generated summary

Recent advances in duplex speech models have enabled natural, low-latency speech-to-speech interactions. However, existing models are restricted to a fixed role and voice, limiting their ability to support structured, role-driven real-world applications and personalized interactions. In this work, we introduce PersonaPlex, a duplex conversational speech model that incorporates hybrid system prompts, combining role conditioning with text prompts and voice cloning with speech samples. PersonaPlex is trained on a large-scale synthetic dataset of paired prompts and user-agent conversations, generated with open-source large language models (LLM) and text-to-speech (TTS) models. To evaluate role conditioning in real-world settings, we extend the Full-Duplex-Bench benchmark beyond a single assistant role to multi-role customer service scenarios. Experiments show that PersonaPlex achieves strong role-conditioned behavior, voice-conditioned speech, and natural conversational responsiveness, surpassing state-of-the-art duplex speech models and hybrid large language model-based speech systems in role adherence, speaker similarity, latency, and naturalness.

Community

Sign up or log in to comment

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2602.06053 in a dataset README.md to link it from this page.

Spaces citing this paper 14

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.