
Google Gemini unlocks new paid skills, photo-to-video feature fully open

Google's parent company Alphabet announced that its Gemini AI assistant will open the "photo to video" feature to paid users. Users can generate an 8-second short video with sound from a single photo and text description through the Gemini web version, with a video resolution of 720p. This feature is powered by the latest generation video generation model Veo 3, aimed at keeping pace with competitors like OpenAI. Google has taken steps to ensure that video generation complies with regulations, but tests show that the technology still has flaws
According to Zhitong Finance APP, Google's parent company Alphabet (GOOGL.US) announced that it will open the "photo to video" feature to paid users. This artificial intelligence tool, which was initially limited to a small-scale test at the beginning of the year, has officially landed on the Gemini AI assistant.
The company stated that starting Thursday, users in specific regions subscribed to the Google AI Ultra and Pro plans can use this feature through the Gemini web version, with mobile app updates being rolled out throughout the week.
This new feature allows users to generate an 8-second short video with sound based on a single photo and text description. The generated MP4 format video has a resolution of 720p and uses a 16:9 landscape aspect ratio.
This update integrates the feature directly into the Gemini chat interface, marking Google's synchronization with American competitors OpenAI and Runway AI Inc. in the AI video field. The global market competition is equally fierce: China's Alibaba Group, AI startup Manus, and Kuaishou Technology have all released upgraded video tools in recent months.
The feature is powered by Google's latest generation video generation model Veo 3, which was released at the developer conference in May and was previously only available through the standalone paid video tool Flow.
Google emphasized that it has taken "important backend measures to ensure video generation complies with regulations," such as prohibiting the use of images of public figures (including celebrities, politicians, and well-known entrepreneurs) to generate videos. Its policy also prohibits content that incites dangerous behavior, violence, or group attacks.
However, tests have shown that the technology still has flaws. Media outlets found in practical tests on the Gemini web version that when uploading personal photos to generate videos of people speaking, the output results often changed facial features and even ethnicity; while it could successfully execute simple commands like "plants swaying in the wind" or "a static cat talking," it only generated images of people waving for more complex requests like "a photo of a person breakdancing."
A Google spokesperson responded to the test results by stating that the AI model does not have instructions to modify a person's appearance, and that photo to video and facial animation are still new technologies that may generate results inconsistent with the original content based on a single image.
The model is better at animating other scenes, such as animating everyday objects, dynamic paintings, and adding motion effects to natural photos. The company will continue to improve various features, including facial animation, in subsequent updates