Drawing while writing, speaking while drawing, the Mixed Yuan Image 2.0 is here!

Wallstreetcn
2025.05.16 12:02
portai
I'm PortAI, I can summarize articles.

On May 16th, Tencent launched a new generation image generation model - Hunyuan Image 2.0, enhancing the image generation speed to "millisecond level." This model achieves real-time interaction, allowing users to see image changes in real-time as they input prompts, with an accuracy rate exceeding 95%. Hunyuan Image 2.0 not only breaks through in speed but also significantly improves image quality, avoiding the "AI flavor" of traditional AIGC images, providing a more realistic texture and detail

On May 16, Tencent launched its next-generation image generation model - Hunyuan Image 2.0, claiming to enhance image generation speed to "millisecond level."

What does "millisecond level" mean? The answer may surprise you: users can see real-time changes in images as they input prompts, achieving a "what you see is what you get" experience.

Tencent stated that thanks to the ultra-high compression ratio image codec and a new diffusion architecture, the model's parameter count has increased by an order of magnitude, achieving millisecond response speed and transforming the traditional "draw card - wait - draw card" method, bringing an innovative interactive experience.

Hunyuan Image 2.0 not only realizes real-time interaction of "drawing while talking," but also achieves a comprehensive leap in model architecture and generation quality. In the GenEval benchmark test, the accuracy of the Hunyuan Image 2.0 model exceeded 95%, far surpassing other similar models, proving its outstanding capability in understanding and generating complex text instructions.

Interactive Innovation: A New Paradigm of "Generating Images While Typing"

Practical tests show that Hunyuan Image 2.0 can achieve complete real-time feedback of "generating images while typing," where the image adjusts in real-time as users input prompts.

For example, inputting "portrait photography, Einstein, background is the Oriental Pearl Tower, selfie angle," the system can generate an image that matches the description in real-time, updating the picture immediately with each new element added.

The character's expression can also change instantly, such as making Einstein stick out his tongue:

In addition, multiple details can be continuously added or modified to the image: a girl, Asian face, big eyes, bright smile, long hair, wearing traditional Chinese clothing, with a hat, in a hand-drawn style.

Anime style, woven style, etc., also yield good results:

This real-time feedback mechanism completely breaks the traditional cumbersome process of "input prompt → wait a few seconds → view results → adjust and retry," significantly lowering the creative threshold and making creative expression more fluid and coherent.

Ultra-Realistic Image Quality: The Perfect Combination of Realism and Detail

In addition to speed, Hunyuan Image 2.0 has also achieved significant improvements in image quality Through algorithms such as reinforcement learning and the introduction of a large amount of human aesthetic knowledge alignment, the generated images effectively avoid the "AI flavor" of AIGC images, presenting a more realistic texture and richer details.

The GenEval evaluation benchmark shows that Tencent's Hunyuan Image 2.0 model has an accuracy rate exceeding 95%, far surpassing similar models. This high fidelity image generation capability has enormous appeal for industries that require high-quality materials (such as advertising, design, etc.).

Secondary Image Editing: The Powerful Function of Image-to-Image

Hunyuan Image 2.0 not only supports text-to-image generation but also provides a powerful "image-to-image" function. It can extract the main subject or contour features of a reference image for secondary editing of existing images.

This capability greatly expands the usage scenarios of the model, allowing users to easily create personalized photos for pets or engage in professional design creation. For example, by uploading a photo of a cat, setting the image reference intensity to 92, the cat's eyes can be enlarged, placed on grass, and wearing a crown.

For instance, users can upload a photo of a cake and then, through simple instructions, change the chocolate flavor to strawberry flavor while keeping the shape and arrangement consistent with the reference image.

Real-time style modifications can also be made to images, adding small elements and comparing the effects with the original image. For example, in the following case, a picture of a kitten generates "home cat, princess cat, gangster cat."

Additionally, it supports one-click coloring for line drawings and a "scene optimization" function that automatically improves composition, depth of field, and lighting effects.

Real-time Drawing Board: A Productivity Tool for Professional Designers

In addition to real-time text-to-image generation, Hunyuan Image 2.0 also offers a real-time drawing board feature.

Based on the model's real-time image generation capability, users can preview coloring effects in sync while drawing line art or adjusting parameters, breaking through the traditional "draw - wait - modify" linear process, which can assist professional designers in their creation.

The real-time drawing board supports multi-image fusion. After users upload multiple images, they can freely create by overlaying several sketches onto the same canvas. With AI automatically coordinating perspective and lighting, it generates fused images based on the content of the prompts, further enriching the interactive experience of AI-generated images.

This feature is particularly suitable for users who have preliminary design ideas but lack professional drawing skills.

Technological Advances: Five Key Breakthroughs

According to the technology media Quantum Bit, behind the mixed Yuan image 2.0 are five key technological breakthroughs:

  1. Larger model size: Compared to previous products, the parameter count has increased by an order of magnitude, significantly improving performance limits.

  2. Ultra-high compression ratio image codec: The Tencent mixed Yuan team developed a codec that greatly reduces the length of image encoding sequences while ensuring detail generation capability through optimization of the information bottleneck layer and enhanced adversarial training.

  3. Multi-modal large language model as a text encoder: Unlike traditional architectures like CLIP and T5 that perform shallow semantic parsing, adapting a multi-modal large language model significantly enhances semantic matching capabilities, surpassing similar products in objective metrics like GenEval.

  4. Full-scale multi-dimensional reinforcement learning post-training: Based on a "slow thinking" reward model, effective improvements in the realism of image generation are achieved through general post-training and aesthetic post-training.

  5. Self-developed adversarial distillation scheme: Based on the latent space consistency model, it directly maps any point on the denoising trajectory to trajectory generation samples, achieving high-quality generation in fewer steps.

Users' Soul Painter Experience

Many netizen creators have shared their experiences:

Image source from creator A Little Nana

Netizens on social platform X expressed:

"An impressive innovation! Redefining creativity through real-time AI image generation."

Some others commented:

"Illusion \ absolute illusion. Really want to explore this."

"Real-time image generation/modification has the potential to open up some crazy new opportunities and ideas."

"This sounds magical! Speed and quality have changed the game. Can't wait to see what everyone creates with it!"

Risk warning and disclaimer

The market has risks, and investment requires caution. This article does not constitute personal investment advice and does not take into account the specific investment goals, financial situation, or needs of individual users. Users should consider whether any opinions, views, or conclusions in this article are suitable for their specific circumstances. Investing based on this is at your own risk