NVIDIA and Alibaba re-evaluate AI, throwing FLOPS "into the trash"

Wallstreetcn
2026.03.18 11:59

What is truly worth remembering at GTC 2026 is not the chips themselves, but the clear new metric of the AI era—token/w. As the output results of AI systems (now Tokens) increasingly outweigh the energy consumption ratio, intelligence will become smarter, and AGI may be born from this

On March 17th, Jensen Huang spoke for more than two hours on stage at NVIDIA GTC 2026, wearing his signature leather jacket. After the event, almost everyone online was saying, "NVIDIA is going to be the king of Tokens."

However, if you listen carefully to this speech, you'll find that what Jensen Huang repeatedly emphasized was not the Token itself, but rather Tokens per Watt. He explicitly mentioned this concept while showcasing inference performance charts, stating: every data center and every AI factory is fundamentally limited by power; a 1GW factory will never become 2GW, as this is determined by the laws of physics. At fixed power, whoever has the highest Tokens per Watt output will have the lowest production costs and the steepest revenue curve.

This statement is the real crux of the entire GTC 2026.

Public discourse is keen on discussing how many times better Vera Rubin is than Blackwell, how Groq LPX can increase inference speed by 35 times, and how NVIDIA plans to move data centers into space. These are certainly important, but they are essentially different expressions of the same logic: maximizing intelligent output per watt of energy under energy constraints.

When Jensen Huang uses "Tokens/W" as the core metric for measuring AI factory output, there is actually a more significant industrial implication behind it. The measurement system for computing power competition is shifting from chips to systems, from peak parameters to end-to-end energy efficiency, and from whose chip is faster to who can convert energy into intelligence more efficiently.

Under the current product and technology matrix, both NVIDIA and Jensen Huang are still constrained by token/w, and there are many steps to take before they can truly become the king of tokens.

This is a migration of "intelligent measurement language," and the industrial perspective opened up by this migration is far more worthy of in-depth discussion than any new chip.

Coincidentally, just a day before GTC officially opened, Alibaba announced the establishment of Alibaba Token Hub, personally led by Eric Wu. Alibaba's AI core is not named after AI but after Token, elevating Token to the strategic height of Alibaba's AI.

This also reflects that viewing AI from a systemic perspective has gradually become a new understanding in the industry. This is precisely the concept I hope to emphasize through this article, and it is the significance of this article.

01 The most noteworthy change in GTC 2026 is not in the chip itself

At GTC 2026, the focus of everyone's attention remains on new products and terms like Vera Rubin, Rubin POD, LPX, and DSX AI Factory. However, if you look at these releases together, you'll find that they push the narrative boundaries of computing power competition from individual chips to the level of computing power infrastructure, which is a complete set composed of computing, networking, storage, power, cooling, control systems, and software that together form an AI factory.

Rubin is described as a POD-scale platform, with multiple racks forming a large-scale, coherent system; DSX is defined as a reference design for AI factories, aimed at maximizing Tokens per Watt This indicates that the true competition in the industry will shift from the computing power of a single chip to the strength of the entire computing system. More specifically, it will focus on whether the entire system can efficiently organize limited power, cooling, and network resources into stable AI output.

In terms of measurement units, this translates to the number of tokens per watt (Token/W).

This article aims to gain insights into the significance conveyed by this conference through the measurement unit of Tokens/W, as well as the opportunities it brings for the development of our AI infrastructure industry.

02 Since the competitive object has become the system, the measurement system cannot remain at the chip level

The measurement system of the chip era is well understood. Peak computing power in Flops, memory bandwidth, FLOPS/W, TOPS/W, bit/J—these metrics are all important because they describe the capability boundaries of a component.

This has led to an awkward situation in practice: there is no objective, unified, and universal measurement unit in intelligent computing centers.

Generally speaking, the unit used to measure data centers is MW for power, while in domestic intelligent computing center construction, PFlops (based on FP16) is used for computing power. However, for clusters with the same computing or power units, if the internal chips, networks, and cooling are different, the efficiency can vary significantly.

The reason is not complicated; previous measurement units could only measure a certain dimension. Peak computing power describes how much computation a chip can theoretically perform, bit/J describes the energy efficiency of local data transport, and bandwidth describes the information pathway capability of a single subsystem. These are all measurements of chips in a certain dimension.

However, the ultimate question that a complete AI system needs to answer is: how many effective AI results can be produced under fixed power budgets, fixed cooling conditions, and fixed data center constraints? This question cannot be answered solely by chip-level metrics.

From NVIDIA's discourse this time, we can see token cost, throughput per watt, token performance per watt, and tokens per watt.

The measurement language system is shifting from component language to system language.

Therefore, if the commonly used measurements at the chip level are peak computing power, bandwidth, and bit/J, then a more reasonable measurement at the system level should be Token/W. The former measures component capability, while the latter measures overall output. The former corresponds to local optimization, while the latter corresponds to system optimization.

03 Token/W connects the chain from energy to intelligent output

In the transcript from NVIDIA's GTC 2026, tokens are referred to as the basic unit of modern AI. This characterization is quite accurate. For large language models, inference services, and agent systems, the object that users ultimately pay for is essentially the system's ability to generate and process tokens.

From a business operation perspective, tokens have three advantages: 1) they are directly coupled with the model inference process. 2) they are directly coupled with the revenue model. 3) they are suitable for covering new loads in the inference era Agent, multi-turn dialogue, long context, retrieval enhancement, tool invocation, reasoning chain—these new loads are difficult to describe with a single FLOPS, yet they leave traces in dimensions such as token, latency, and goodput.

More importantly, the underlying constraints of today's AI infrastructure are increasingly manifesting as energy constraints. The IEA's report "Energy and AI" predicts that by 2030, global data center electricity consumption will rise to approximately 945 TWh, a significant increase from current levels; AI is one of the most important driving factors, with the United States accounting for a large share of this growth. In other words, many of the upcoming issues in the AI industry may superficially appear to be chip-related, but they are essentially problems of electricity, heat dissipation, and infrastructure organization.

The concept of Token/W is valuable because it connects the most critical chain in the AI industry: power input, through computation, networking, storage, scheduling, and cooling, ultimately resulting in token output.

In this sense, Token/W is not simply a replacement for FLOPS/W or bit/J. It fills in a layer of perspective that has not been focused on in the past:

How much energy does the AI system convert into intelligent output?

I believe that the most noteworthy discussion point at this GTC is precisely here: we can no longer view chips in isolation; we must place chips within the system and the system within the constraints of the industry.

This is also the perspective that the author has consistently advocated. When looking at AI chips, we cannot only consider peak computing power, memory bandwidth and size, and interface parameters; we must also examine how they collaborate within the network, how they are deployed in racks, how they draw power in parks, how they form cost structures for customers, and ultimately how they translate into real output on the business side.

GTC 2026, to some extent, publicly validated this systemic perspective. Because when NVIDIA itself begins to center its narrative around the AI factory, the industry is already transitioning from chip-centric AI computing to system-centric computing.

This point is actually very crucial. Many industries become obsessed with component parameters in the early stages because they are the easiest to measure and promote. However, once the industry enters the large-scale deployment phase, what truly determines success or failure is often the ability to organize systems. Today's AI infrastructure has reached this stage.

04 The Importance of Optical Interconnect Will Significantly Increase as We Push Down from Token/W

Once the measurement system shifts to the system level, many previously viewed as supporting elements will elevate in status.

Optical interconnects are one of the most typical examples.

In the past, when discussing optical interconnects, the industry often used perspectives such as optical modules, communication, and devices: higher bandwidth, longer transmission, lower pJ/bit, better bandwidth density, and lower insertion loss. While all of these are relevant, this language still remains at the subsystem level of components and chips. Within the Token/W framework, the value of optical interconnects becomes more intuitive: it reduces the energy cost of data transport and enhances the ability of large-scale AI computing systems to convert electricity into tokens When discussing NVIDIA's optical network products, the photon-based CPO can achieve up to 5 times the energy efficiency compared to optical modules, while also reducing latency and supporting the expansion of larger AI factories.

The key point of this statement is not just that the link is more advanced, but that the system is larger and has higher energy efficiency.

From an industrial logic perspective, this is easy to understand. As models become larger, contexts longer, and clusters bigger, much of the energy consumption in the system does not occur in the arithmetic units, but in data transportation, communication across chips, boards, cabinets, and across PODs.

At this stage, improving Token/W can no longer rely solely on stronger GPUs; it also requires more efficient interconnects.

Therefore, from the perspective of Token/W, the development of optical interconnects is not because they are cutting-edge, but because they are becoming a necessary energy-saving measure for large-scale AI systems.

05 Optical computing is more advanced than optical interconnects, but the logic is also starting to make sense

Optical computing is indeed in an earlier stage than optical interconnects, and this must be acknowledged.

Issues such as universality, precision, compilers, manufacturing consistency, and system integration are still evolving. However, if we expand the observation boundary to the system level, its industrial significance is now easier to articulate than in the past.

The reason is that Token/W concerns end-to-end energy efficiency. Whoever can significantly reduce energy consumption on a certain type of high-frequency, high-density, repeatable mapping computational path has the opportunity to improve token output efficiency at the system level. This logic does not require optical computing to replace the entire GPU, nor does it require it to become a universal computing foundation all at once.

It only requires one thing: to reduce the J/token of the entire system for certain key workloads, while increasing the token output under a fixed power budget.

This is also why the narrative around optical computing needs to shift from the efficiency of individual devices to the energy-saving contributions at the system level. If the industry only looks at TOPS/W and MAC/J, it resembles a laboratory story; but if the industry begins to look at Token/W, it has the opportunity to enter the infrastructure discussion.

This change is particularly important for optical computing. Because it finally has a higher-level language to communicate with customers, parks, electricity, and capital expenditures.

06 As the measurement of computing power shifts from chips to systems, optical interconnects and optical computing are pushed to the industrial mainstream

When the competition for computing power mainly stayed at the chip level, optical interconnects resembled I/O technology, while optical computing resembled cutting-edge device exploration.

When the competition for computing power shifts to large-scale AI system-level infrastructure, things change. System efficiency increasingly depends on dense computing energy consumption, data transportation, context management, cross-node collaboration, power supply, and thermal management organization, and these are precisely the areas where optics have the greatest opportunity to play a role.

From the perspective of Token/W, optical interconnects address the transportation electricity costs behind each token generation; optical computing attempts to rewrite part of the computing electricity costs behind each token. Both jointly influence the token output efficiency of the entire system. This is the fundamental reason for their entry into the main line of the industry.

To be more realistic, in addition to chip production capacity and supply, the future constraints faced by data centers and AI factories will also include grid access, data center cooling, park energy consumption, cabinet power density, and production speed. The previous judgment by the International Energy Agency on AI's energy consumption, as well as NVIDIA's expression regarding AI factories this time, all point in the same direction: AI infrastructure is becoming a system engineering measured by energy.

Looking forward from this new direction, what optical interconnect and optical computing solve is the increasingly expensive and increasingly difficult part of continuing to optimize through traditional electrical paths in the AI era: the energy cost of data transportation and the unit energy consumption of high-density computing.

What this reflects is a more complete system thinking. This is also why this GTC 2026 will once again emphasize photonics and silicon photonics technology products:

When the measurement of computing power shifts from chips to systems, optics will gradually move from an advanced technology option to a worthwhile industrial infrastructure.

From this perspective, CPO and optical computing systems are very promising for the future!

Final Thoughts: The Main Axis of AGI Advancement

In my daily work, I have been advocating for the establishment of objective and measurable computing power measurement standards, and I have been using Tokens/W methods to measure the testing of different computing power chips.

Looking back at the history of technology, when the output energy-to-weight ratio of internal combustion engines became higher and higher, cars were born, airplanes could take off, and rockets could ascend.

In the AI era, when the output results of AI systems (currently Tokens) and the energy consumption ratio become higher and higher, intelligence will become increasingly smart, and AGI may be born within it.

What is truly worth remembering from this GTC 2026 is not the honor or disgrace of NVIDIA as a company, or whether Jensen Huang becomes the "Token King," but the clarity of the new measurement standards in the AI era.

Furthermore, NVIDIA, Alibaba, and perhaps many industry giants have begun to realize that AI industry development should be viewed from a system thinking perspective.

This is actually consistent with the main axis of human civilization development, which is: to collect, transmit, and process more information with lower energy.

AGI will not be an exception!

Source: Tencent Technology