Track Hyper | Soul launches a full-duplex communication large model

Wallstreetcn
2025.08.01 08:15
portai
I'm PortAI, I can summarize articles.

At the 2025 World Artificial Intelligence Conference, the social platform Soul App showcased its self-developed full-duplex calling large model, which is planned for internal testing on the Soul platform. This model aims to enhance the interactive experience of real-time calls with virtual people and AI matching, breaking the traditional turn-based dialogue model, allowing AI to actively participate in conversations, and improving the naturalness of human-computer interaction. Through a multi-dimensional perception system, AI can better understand user intentions and emotional states, facilitating smoother communication

Author: Zhou Yuan / Wall Street News

At the 2025 World Artificial Intelligence Conference and High-Level Meeting on Global Governance of Artificial Intelligence (WAIC 2025) exhibition, the social platform Soul App showcased its self-developed full-duplex calling large model.

This model is set to begin internal testing on the Soul platform, with plans to apply it to real-time conversations with virtual humans, AI matching, and other 1V1 and multi-party interactive scenarios, marking a new exploration in the social field.

Currently, the performance of Soul's "virtual human" in user interaction scenarios already possesses a strong sense of "realism," but this interaction is currently limited to text form.

The launch of the self-developed full-duplex calling large model, if it can achieve real-time conversation, will indeed enhance its intelligence significantly.

Adjusting Traditional Interaction Models

Traditional voice interaction has long relied on VAD (Voice Activity Detection) mechanisms and delay control logic, forming a turn-based dialogue model.

In this model, human-computer dialogue presents a rigid rhythm of question and answer: the AI only begins to respond after the user has finished speaking, resulting in a noticeable delay that affects the naturalness of the interaction.

Often, during user speech, there may be brief pauses for thought, which could be misinterpreted by the system as the end of speech, leading to premature AI intervention that interrupts the user's train of thought, making the communication feel overly stiff.

Soul's self-developed end-to-end full-duplex voice calling large model has adjusted this traditional model: it no longer uses the VAD mechanism and attempts to allow the AI to autonomously control the rhythm of the conversation through algorithms.

In actual interactions, the AI can monitor the dynamics of the dialogue in real-time, capable of proactively breaking the silence, appropriately interrupting the user, and engaging in simultaneous listening and speaking.

For example, when a user pauses to think while recounting something, the AI can detect that the speech has not ended and provide supplementary guiding phrases to advance the topic; in multi-person communication scenarios, the AI can judge the timing to join the discussion, intertwining with the user's speech to make the dialogue smoother, approaching the state of face-to-face communication.

Theoretically, this interaction model transforms the AI from a passive responder to an active participant, which can enhance the naturalness of human-computer dialogue to a certain extent.

To make the AI's interactions closer to that of a "real person," Soul's full-duplex calling large model has constructed a multi-dimensional perception system, including time perception, environmental perception, and event perception. By analyzing information from these dimensions, the AI attempts to better understand user intentions and emotional states, providing contextually relevant responses.

From the perspective of time perception, the AI adjusts its language style and topics based on the timing of the conversation. In the morning, it might start with "Good morning, what are your plans for the new day?"; late at night, when a user shares their worries, the response will be gentler, offering emotional support.

In terms of environmental perception, the model can recognize the user's surroundings, appropriately increasing volume in noisy environments to ensure clarity, while speaking more softly in quiet settings.

Regarding event perception, the AI can provide targeted viewpoints based on the events being discussed. When a user shares about completing an important project at work, the AI will congratulate them and ask for details, enhancing the authenticity of the conversation In addition, the model has been optimized for colloquial expression and tone reproduction: it can simulate features of everyday speech such as filler words, stuttering, and emotional fluctuations, and can replicate specific tones based on user needs.

There is also emotional expression; the AI's voice emotion changes as the conversation progresses. When users share joy, the tone rises, and when users are feeling down, the voice becomes low and concerned.

These processes enhance the realism of AI interactions to some extent, but there is still a gap from fully simulating a real person, leaving significant room for improvement.

In fact, before the launch of this large model, the Soul virtual person demonstrated a high level of naturalness and intelligence in 1V1 interactions or when responding to user comments. If Soul did not label itself as a "virtual person," its responses already had a high degree of "real person" feel.

AI Enhancing the Authenticity of Electronic Socializing

The full-duplex call large model has been applied in various scenarios on the Soul platform, impacting users' social experiences in 1V1 and many-to-many interaction scenarios.

In real-time call scenarios with virtual people, this model is expected to make communication between virtual people and users more natural.

Previously, virtual person dialogues were relatively rigid; with this model, virtual people can capture users' emotions and speech changes in real-time, adjust their responses and tone, and provide more personalized companionship services, allowing users to feel more authentic emotional feedback.

What role does the model play in AI-matched 1V1 interaction scenarios? For example, it can help users filter compatible chat partners through algorithms, improving social matching efficiency.

During the communication process, the model analyzes the dialogue content and emotions of both parties, providing timely topic suggestions or guidance: when the initial conversation between matched parties hits a lull, the AI might introduce a topic related to both parties' interests, such as "I heard you both like photography; have you taken any satisfying works recently?" to break the ice and make the conversation smoother.

In multi-person voice interaction scenarios like group chat parties, the AI host has corresponding functions: after users enter the group chat party, the AI host can manage the order of the chat, control speaking sequences, remind users to communicate civilly, and interact with users via voice.

When the atmosphere in the group is dull, it can initiate topics like "Has anyone seen any good movies lately? Share with us!" to attract user participation; when new members join, it warmly greets them and guides mutual introductions, helping new members integrate, which may enhance participation in the group chat party.

The emergence of the Soul full-duplex call large model brings a new direction for its platform development and provides a reference case for the AI social industry.

This model demonstrates a possible application of AI technology in the social field: breaking traditional interaction limitations through technological innovation to achieve a more natural social experience.

With the promotion and application of this technology, other social platforms may increase their investment in AI technology research and development, exploring the integration of AI technology into social scenarios to promote technological development in the industry. For example, enhancing dialogue fluency or conducting in-depth research on multi-dimensional perception to enhance the immersive experience of social interactions.

Soul's practices will attract more developers' attention to the AI social field, prompting the emergence of new social applications and services. The development of AI socializing will influence people's social methods and concepts, breaking geographical and temporal limitations, allowing people to more conveniently meet friends from different regions and expand their social circles As the role of AI in social interactions becomes more prominent, the definition of "social" may change, with a greater emphasis on emotional resonance and information exchange with AI and other users.

The Soul full-duplex call large model is about to enter internal testing and application, representing a new attempt in the field of AI social interaction: leveraging new technological architecture and application scenarios to bring users a new social experience and provide ideas for industry development.

Risk Warning and Disclaimer

The market carries risks, and investment should be cautious. This article does not constitute personal investment advice and does not take into account the specific investment goals, financial situation, or needs of individual users. Users should consider whether any opinions, views, or conclusions in this article align with their specific circumstances. Investment based on this is at one's own risk