Google's strongest AI chip targets NVIDIA B200, performance skyrockets 3600 times! Google version MCP unifies the AI intelligent body universe

Wallstreetcn
2025.04.10 13:31
portai
I'm PortAI, I can summarize articles.

Google launched the seventh-generation TPU—Ironwood—at the Cloud Conference, with performance improved by 3,600 times compared to the first-generation TPU in 2018, becoming the first AI accelerator specifically designed for inference, directly competing with NVIDIA's B200. The new TPU's power efficiency has doubled, marking a significant breakthrough for Google in the AI hardware field. In addition, Google also introduced the Agent2Agent open protocol and several AI platform upgrades, promoting the development of AI infrastructure towards a smarter and more proactive direction

The first TPU of the inference era has been born!

Last night, at the annual Google Cloud Conference, Google's seventh-generation TPU—Ironwood—made its debut, directly challenging NVIDIA's Blackwell B200.

It is Google's most powerful and scalable custom AI accelerator to date and the first accelerator designed specifically for inference.

Compared to the first-generation TPU from 2018, Ironwood's inference performance has skyrocketed by 3,600 times, with efficiency improving by 29 times.

In fact, the performance of the seventh-generation TPU is 24 times that of the world's largest supercomputer. Google will officially launch TPU v7 later this year.

Following MPC, the Google Conference also introduced the Agent2Agent (A2A) open protocol for the first time, providing a universal language for agents to communicate and collaborate across different ecosystems.

Additionally, ADK and Agentspace provide developers with comprehensive capabilities to build, operate, and manage AI agents.

Moreover, the Google Cloud Conference showcased exciting developments, with Veo 2, Imagen 3, and Chirp 3 all iterating and upgrading, the text-to-music model Lyria, and Vertex AI becoming the only generative AI platform covering video, images, voice, and music.

Next, the cost-effective Gemini 2.5 Flash will also be available on Vertex AI.

The first TPU of the inference era is here, rivaling the B200

The birth of Ironwood not only marks another significant breakthrough for Google in AI hardware but also represents a major shift in AI infrastructure.

In Google's view, the current passive "reactive" models are transforming into active "generative" agents.

The core of this transformation lies in the fact that AI is no longer limited to providing raw data but is capable of actively retrieving information and generating insights.

This is precisely Google's definition of the future AI infrastructure in this "inference era": smarter, more proactive, and more collaborative Key Features

  • Significant performance improvement while focusing on power efficiency, enabling AI workloads to run more cost-effectively.

Compared to the sixth-generation TPU Trillium, Ironwood achieves a 2-fold improvement in power efficiency (perf/watt); it is nearly 30 times higher than the first Cloud TPU launched in 2018.

At the same time, Google's advanced liquid cooling solution and optimized chip design can reliably maintain performance up to twice that of standard air cooling, even under sustained, heavy AI workloads.

Figure 3. Power efficiency improved by 29.3 times compared to TPU v2

  • Significant increase in high-bandwidth memory (HBM) capacity

The Ironwood chip is equipped with up to 192GB of video memory, which is 6 times that of Trillium.

This allows for the processing of larger models and datasets while reducing the need for frequent data transfers, thereby improving performance.

  • Significant increase in HBM bandwidth

The Ironwood chip has achieved an astonishing bandwidth of 7.2 Tbps, which is 4.5 times that of Trillium.

The extremely high bandwidth ensures fast data access, which is crucial for memory-intensive workloads common in modern AI.

  • Enhanced inter-chip interconnect (ICI) bandwidth

The bidirectional bandwidth of Ironwood has increased to 1.2 Tbps, which is 1.5 times that of Trillium. This faster communication between chips facilitates large-scale efficient distributed training and inference.

Driving the Inference Era with Ironwood

Ironwood provides the necessary large-scale parallel processing capabilities for the most demanding AI workloads, such as training and inference for ultra-large-scale dense LLM or MoE models with cognitive capabilities.

For Google Cloud customers, Ironwood offers two specifications based on AI workload requirements—256 chips or 9,216 chips.

Figure 1. Peak FP8 floating-point performance improved by 3600 times compared to TPU v2 Among them, each individual chip has a peak computing power of 4,614 TFLOPs.

When scaled to 9,216 chips per pod, totaling 42.5 Exaflops, Ironwood's computing power is more than 24 times that of the world's largest supercomputer, El Capitan— which can only provide 1.7 Exaflops per pod.

Moreover, Ironwood is equipped with an enhanced accelerator specifically designed for advanced sorting and recommendation tasks—SparseCore. This provides acceleration for a broader range of workloads, extending beyond traditional AI fields into finance and science.

Pathways is an ML runtime developed by Google DeepMind that enables efficient distributed computing across multiple TPU chips.

Pathways on Google Cloud makes it straightforward to scale beyond a single Ironwood Pod, allowing the combination of hundreds of thousands of Ironwood chips to rapidly advance the frontier of generative AI computing.

Figure 2. Ironwood natively supports FP8, while the peak TFlops for v4 and v5p are simulated values.

OpenAI researchers conducted a performance comparison between Ironwood and NVIDIA's GB 200, stating that TPU v7 performs comparably to GB200, even slightly better.

The All-Modal AI Platform is Here, Veo 2 Upgraded Again

With the addition of music functionality, Vertex AI is now the only platform with a generative media model covering all modalities—video, image, voice, and music.

This major update includes four key features:

  • The text-to-music model Lyria, allowing customers to generate complete, production-ready material from text prompts.
  • New editing and camera control features in Veo 2, helping enterprise customers precisely optimize and reuse video content.
  • Chirp 3 now includes Instant Custom Voice, a new method that creates custom voices with just 10 seconds of audio input.
  • Imagen 3 has improved image generation and image repair capabilities for reconstructing missing or damaged parts of images and enhancing the quality of object removal edits.

Lyria: Text-to-Music Model Lyria can generate high-fidelity audio, capturing intricate details and providing rich, detailed compositions across various music genres.

  • Businesses can enhance brand experience

Quickly customize soundtracks for marketing campaigns, product launches, or immersive in-store experiences based on the brand's unique tone.

With Lyria, businesses can create sounds that resonate deeply with their target audience, fostering emotional connections and enhancing brand recall.

  • Creators can simplify the content creation process

For video production, podcasts, and digital content creation, finding the perfect royalty-free music can be a time-consuming and costly process.

Lyria can generate custom music tracks in minutes, directly aligning with the emotion, rhythm, and narrative of your content, helping to accelerate production workflows and reduce licensing costs. For example:

Create a high-energy Bebop tune. Prioritize dazzling saxophone and trumpet solos that exchange complex phrases at lightning speed. The piano should provide a percussive chord accompaniment, with a walking bass and fast-paced drumming driving the frenzied energy. The tone should be exhilarating and intense. Capture the feel of a smoky jazz club at midnight, showcasing superb craftsmanship and improvisation. Make it impossible for the audience to sit still.

Veo 2: Expanded Editing Features

Veo 2 adds a powerful set of features for creating, editing, and visual effects for videos, transforming it from a generation tool into a comprehensive video creation and editing platform:

  • Video Inpainting: Achieve clean, professional editing effects without manual touch-ups.

You can remove unwanted background images, logos, or distractions from the video, making them smoothly and perfectly disappear in every frame, as if they never existed.

  • Outpainting: Extend the visuals of existing video material, converting traditional videos into formats optimized for web and mobile platforms.

You can easily adjust content to fit different screen sizes and aspect ratios— for example, converting horizontal videos into vertical videos for social media shorts.

  • Apply complex filmmaking techniques: New features include guidance on shot composition, camera angles, and pacing.

Teams can easily employ complex filmmaking techniques without complicated prompts or expertise.

For example, use camera presets to move the camera in different directions, create time-lapse effects, or generate drone-style shots.

  • Create coherent videos by connecting two existing materials With the interpolation feature, you can define the start and end of a video sequence, allowing Veo to seamlessly generate connecting frames.

This ensures smooth transitions and maintains visual continuity, creating a beautiful and professional final product.

Chirp 3: Instant Custom Voice and Transcription Feature Update

Chirp 3's HD voices feature offers natural and realistic voices in over 35 languages, with 8 speaker options.

In addition, Google has introduced two new features:

  • Instant Custom Voice

With just 10 seconds of audio input, realistic custom voices can be generated. This allows businesses to personalize call centers, develop accessible content, and establish a unique brand voice—while maintaining a consistent brand image.

  • Transcription with Diarization

This powerful feature can accurately separate and identify individual speakers in multi-person recordings, significantly improving the clarity and usability of transcribed content, suitable for applications such as meeting minutes, podcast analysis, and multi-party call recordings.

Imagen 3: Improved Quality and Editing Features

As Google's highest quality text-to-image model, Imagen 3 can generate images with better detail, richer lighting, and fewer distracting artifacts than before.

This time, Google has significantly improved Imagen 3's inpainting capabilities for reconstructing missing or damaged parts of images.

Especially in object removal, not only is the quality higher, but the results are also more natural.

After MCP, Google Rebuilds the A2A Protocol

Agents can help people with many tasks, from ordering new computers to assisting customer service representatives to supporting supply chain planning.

The key to making agents increasingly practical lies in enabling them to collaborate within a dynamic multi-agent ecosystem, bridging isolated data systems and applications.

To this end, Google has launched a new open protocol—Agent2Agent (A2A), supported and contributed by over 50 partners.

The A2A protocol will enable AI agents to communicate with each other, securely exchange information, and coordinate actions across various enterprise platforms or applications.

It is an open protocol that complements Anthropic's Model Context Protocol (MCP) A2A Design Principles

A2A follows five core principles:

  • Embrace the capabilities of agents: A2A is committed to enabling agents to collaborate in their natural, unstructured ways.
  • Based on existing standards: A2A is built on existing, widely used standards such as HTTP, SSE, and JSON-RPC.
  • Default security: A2A supports enterprise-level authentication and authorization from the outset, consistent with the authentication mechanisms of OpenAPI, ensuring security.
  • Support for long-running tasks: Flexibility was considered in the design of A2A, capable of handling a range of scenarios from quick tasks to in-depth research that may take hours or even days, especially with human involvement.
  • No modality restrictions: The world of agents is not limited to text, so A2A is designed to support multiple modalities, including audio and video streams, making it more diverse.

How A2A Works

A2A facilitates smoother communication between "client" agents and "remote" agents.

Client agents are responsible for formulating and conveying tasks, while remote agents are responsible for executing these tasks, striving to provide accurate information or take appropriate actions. This interaction involves several key functions:

  • Capability discovery: Agents can showcase their capabilities through a JSON-formatted "Agent Card." Client agents can find the most suitable agent for a task based on this "business card" and communicate with remote agents via A2A.
  • Task management: Communication between client agents and remote agents is centered around completing tasks, aimed at meeting user needs.
  • Collaboration: Agents can send messages to each other, sharing context, responses, outputs, or user instructions.
  • User experience negotiation: Each message includes "parts," which is a complete content unit, such as a generated image.

Example: Finding Candidates

Users (such as hiring managers) can instruct their agents to find candidates that match the job description.

This agent will interact with other specialized agents to help uncover potential candidates. Once the user receives a list of recommendations, they can instruct the agent to arrange subsequent interviews, making the hiring process smoother. After the interviews, another agent can assist with background checks.

A2A is expected to usher in a new era of agent interoperability, driving innovation and creating a more powerful and flexible AI agent system. We believe this protocol will pave the way for the future, enabling agents to collaborate seamlessly, solve complex problems, and enhance our lives We are committed to openly building this protocol with partners and the community. We will open-source the protocol and establish clear participation paths for contributors.

Google AI Code Assistant Transforms into a Super Intelligent Agent

Another update from this conference is that Google's AI coding assistant—Gemini Code Assist—has gained "agent" capabilities in preview!

At the Cloud Next conference, Google announced that Code Assist can now deploy new AI agents that can perform multiple steps to complete complex programming tasks.

For example, these agents can create applications from product specifications in Google Docs or convert code from one language to another.

Additionally, Code Assist is now available for use in Android Studio, among other coding environments.

This upgrade comes as Google feels pressured by competitors like GitHub Copilot, Cursor, and Devin.

It is evident that there is a huge gold rush market in AI programming, and competition among various players is becoming increasingly fierce.

However, it is still unclear to what extent Code Assist can perform. Research shows that even the best code-generating AIs today often introduce security vulnerabilities and errors due to weaknesses in understanding programming logic.

For instance, an assessment of Devin found that it only completed 3 out of 20 tasks.

Next, let us look forward to Gemini Code Assist's performance in real programming environments.

Source: New Intelligence, original title: "Google's Strongest AI Chip Targets Nvidia B200, Performance Soars 3600 Times! Google Version MCP Unifies the AI Intelligent Agent Universe"

Risk Warning and Disclaimer

The market has risks, and investment requires caution. This article does not constitute personal investment advice and does not take into account the specific investment goals, financial conditions, or needs of individual users. Users should consider whether any opinions, views, or conclusions in this article align with their specific circumstances. Investment based on this is at one's own risk