Introducing ChatGPT, the conversational and image-savvy AI! Taking "super assistant" to the next level.

ChatGPT will speak up and has 5 different voices, directly competing with C-end personal assistants like Siri, and can also answer or give suggestions based on pictures.

OpenAI announced on its official website on Monday that it will launch voice and image capabilities for ChatGPT in the next two weeks, targeting Plus and enterprise users. These new features will allow users to engage in voice conversations or show images to ChatGPT.

In terms of voice capabilities, ChatGPT can now answer questions and follow commands using voice, directly competing with consumer personal assistants like Apple's Siri. Additionally, ChatGPT will offer five different voices for users to choose from, and it will support functions such as generating text from voice audio and translating podcast voices into other languages.

As for image capabilities, users can submit images and ask related questions, and ChatGPT will provide answers or suggestions based on the images. It is reported that the voice capabilities will be launched on iOS and Android platforms, while the image capabilities will be available on all platforms.

OpenAI has upgraded the interaction between users and ChatGPT, allowing users to prompt the chatbot not only by typing in the text box but also by speaking out loud. This functionality is not unfamiliar, similar to conversing with Google Assistant, but OpenAI aims to provide better answers due to improvements in underlying technology. Currently, most virtual assistants rely on large models for reconstruction, and OpenAI is simply leading the way.

OpenAI released the ChatGPT application in May this year, which already includes the feature of converting speech to text. Adding voice responses allows users to feel engaged in more human-like conversations. The company hopes that this new feature will encourage users to use its mobile application anytime and anywhere, directly competing with personal assistant products such as Google Assistant, Apple's Siri, or Amazon's Alexa.

OpenAI is also launching a new text-to-speech model, claiming that it can "generate human-like audio from text and a few seconds of speech sample." Users can choose from five options for ChatGPT's voice, but OpenAI seems to believe that the potential of this model goes far beyond that. For example, OpenAI is collaborating with Spotify to translate podcasts into other languages while preserving the voice of the podcast. Synthetic speech has many interesting applications, and OpenAI may become an important part of this industry.

The company also stated that paid and enterprise users will have access to image capabilities. Image search is somewhat similar to Google Lens, where users can simply take a picture of something they are interested in, and ChatGPT will identify the subject and provide a corresponding response.

For example, users can upload a picture of pink sunglasses and ask the chatbot to recommend matching outfits, or submit a picture of a math problem and request assistance in solving it. Analysis suggests that since the launch of ChatGPT in early 2022, OpenAI has been working hard to add more features and capabilities to its chatbot while avoiding the emergence of new problems. With this update, the company is attempting to find a balance by consciously limiting what its new model can do.

However, this approach is not a long-term solution. As more people use voice control and image search, and as ChatGPT gradually becomes a truly multimodal and practical virtual assistant, maintaining safe and reasonable boundaries will become increasingly difficult.

ChatGPT aims to become a "super assistant"

This upgrade undoubtedly brings ChatGPT one step closer to becoming a "super intelligent personal work assistant" and intensifies competition with downstream software.

As previously mentioned, OpenAI CEO Sam Altman privately told developers that the company hopes to transform ChatGPT into a "super intelligent personal work assistant" capable of performing various tasks based on personal and work needs, such as drafting emails or documents in the user's style and providing up-to-date information on relevant business matters.

Analysis suggests that both Microsoft and OpenAI can provide technical services to B2B customers who need to build AI capabilities, creating a direct business conflict between the two. In the long run, if OpenAI accelerates its efforts to develop software for individuals and businesses, ChatGPT may reshape the consumer application ecosystem, and a "breakup" between the two may be inevitable sooner or later.