a16z talks to SemiAnalysis founder: NVIDIA's strategy and future

Wallstreetcn
2025.09.23 10:50
portai
I'm PortAI, I can summarize articles.

Dylan Patel, founder of SemiAnalysis, stated that NVIDIA's strategic core lies in its aggressive decision-making and excellent execution that "bets the company." In the face of cloud giants, it maintains cooperation with major clients through a strategy of unified pricing and control over distribution rights, while also investing in and supporting emerging cloud service providers like CoreWeave, creating a complex ecosystem of both competition and cooperation to solidify its dominant position in the AI market

Recently, NVIDIA invested $5 billion in Intel and announced a collaboration on customized data center and PC products. This is considered one of the biggest surprises in the semiconductor industry in recent years, with industry insiders stating that an "impossible alliance" is accelerating formation.

In this regard, top Silicon Valley venture capital firm a16z and SemiAnalysis pointed out in an in-depth dialogue that NVIDIA's move reflects its CEO Jensen Huang's consistent style—daring to bet on the future at critical moments. This "betting the entire company" execution capability is a key factor in its dominance in the AI era.

On September 22, Erik Torenberg, a general partner at Silicon Valley venture capital giant a16z, held an interview with Dylan Patel, founder, CEO, and chief analyst of SemiAnalysis, a16z general partner Sarah Wang, and a16z partner and former chief technology officer of Intel's data center and AI business unit Guido Appenzeller. During the interview, they discussed NVIDIA's core strategy, the construction of its moat, Jensen Huang's leadership, and competition and cooperation with other major cloud players.

(Image source: A16Z interview screenshot. From left to right: Erik Torenberg, Sarah Wang, Guido Appenzeller, Dylan Patel)

SemiAnalysis chief analyst Dylan Patel pointed out in the interview that the "impossible ally" relationship between NVIDIA and Intel reflects Jensen Huang's consistent "all-in" strategy. a16z partner Sarah Wang compared Jensen Huang to Warren Buffett in the semiconductor industry, believing that his investment decisions have a strong market signaling effect.

Regarding the core strategic characteristics of NVIDIA, Dylan Patel identified the following three points:

Intuition-driven aggressive decision-making: Jensen Huang makes significant investments based on strong intuition at critical moments, such as pre-purchasing before Xbox order confirmation and persuading the supply chain to expand production during the cryptocurrency bubble.

Rapid execution capability: NVIDIA almost always successfully delivers chips on the first attempt, avoiding the common multi-version iteration issues faced by competitors.

Ecosystem investment: By investing in new cloud service providers like CoreWeave, NVIDIA cultivates a diversified customer base and avoids over-reliance on traditional super cloud vendors.

When discussing the competitive landscape with cloud giants, Dylan Patel noted that NVIDIA is reshaping the cloud computing competition map.

"Beyond traditional players like Microsoft, Amazon, and Google, Oracle has emerged as an important AI cloud computing player with its $300 billion contract with OpenAI. Although Amazon lagged in early AI infrastructure deployment, it is accelerating again with the largest data center capacity." It is expected that capital expenditures for super cloud companies will reach USD 450-500 billion by 2026, far exceeding Wall Street's expectation of USD 360 billion, with most of these expenditures still flowing to NVIDIA.

Additionally, NVIDIA has built a complex ecosystem of both competition and cooperation by maintaining relationships with major clients while supporting emerging cloud service providers through an equal pricing strategy (offering the same price to all customers under the guise of antitrust) and allocation control. This strategy allows NVIDIA to maintain its dominant position in the "megawatt era" of AI infrastructure while providing diverse options to address the challenges of deploying massive cash flows.

Summary of interview highlights:

Dylan Patel, founder of SemiAnalysis

On NVIDIA's strategy:

  • The goal of playing games is to win, and the reason you win is so you can keep playing. For him, everything is about the 'next generation'... His focus is always on 'now + next generation'; nothing else matters.
  • Jensen (Huang Renxun) is the kind of person crazy enough to bet the entire company. For example, they would place large orders before the chips have even been successfully tested, putting all the money on the company's books into it.
  • They have a mentality of 'who cares, just ship it', 'do it quickly', 'get it out fast', 'don't let unnecessary features delay it', ensuring no version changes are needed so they can respond to the market as quickly as possible.
  • Regarding Volta, that was NVIDIA's first chip with tensor cores... Just months before sending it to fab (factory), they added tensor cores and decided 'who cares'—they changed the architecture. If they hadn't done that at the time, perhaps someone else could have seized market leadership in AI chips.

NVIDIA's future: A trillion-dollar "bull logic"

  • NVIDIA's claim is even bolder: they say future annual spending on AI infrastructure will reach 'trillions of dollars', and they want to capture a large portion of it.
  • Whether it's having AI agents help you write code or chatting with your AI girlfriend Annie, all of this is powered by NVIDIA.
  • Look at what Musk recently said; he said 'Tesla is worth a trillion dollars because of humanoid robots'. If all of this relies on NVIDIA for training... then NVIDIA is also worth a trillion dollars, right?

Competitive and cooperative relationship with major cloud companies

  • The so-called super cloud companies include Microsoft, CoreWeave, Amazon, Google, Oracle, and Meta... The consensus expectation among investment banks is that these companies' total capital expenditures next year will be around USD 360 billion. My own estimate is closer to USD 450-500 billion, with most of these expenditures still flowing to NVIDIA.
  • The way you buy GPUs is like buying cocaine. You call a few people, text a few people, asking, 'Hey, how much do you have? How much is it?'
  • "In the past, he offered bulk discounts to super cloud service providers. But now, because he can use antitrust as a reason, he says 'everyone's price is the same.'

The following is the full transcript of the interview video (AI-assisted translation):

To better understand NVIDIA's $5 billion investment in Intel, Erik Torenberg, a general partner at Silicon Valley venture capital giant a16z, recently held an in-depth interview with Dylan Patel, chief analyst at SemiAnalysis, a16z general partner Sarah Wang, and a16z partner and former chief technology officer of Intel's data center and artificial intelligence business, Guido Appenzeller. In the interview, they discussed what this deal means for NVIDIA, Intel, AMD, and ARM; NVIDIA's moat and Jensen Huang's leadership; as well as the future of GPUs, large data centers, and artificial intelligence infrastructure.

In the text below, D refers to Dylan Patel; S: Sarah Wang; G: Guido Appenzeller; E: Erik Torenberg.

Beginning of the interview video

D: The way you buy GPUs is like buying cocaine. You call a few people, text a few people, asking, "Hey, how much do you have? How much?"

G: If your two most hated enemies suddenly teamed up, that would be the worst news you could possibly encounter. I never expected this to happen. I think it's an astonishing development.

S: It's like Warren Buffett buying a stock. Jensen Huang is the "Buffett effect" in the semiconductor world.

D: Everything seems to have turned poetic—after all the twists and turns, Intel now seems to be crawling to find NVIDIA.

NVIDIA and Intel: Unlikely Allies

E: Dylan, welcome back to our podcast.

D: Thanks for having me, I'm glad to be here again.

E: Well, just as we are recording this, a major news event has occurred—NVIDIA announced a $5 billion investment in Intel, and the two sides will collaborate to develop customized data center and PC products. What do you think of this collaboration?

D: I think this is really interesting, even a bit funny. As soon as the news of NVIDIA's investment was announced, its investment has already risen by 30%. With a $5 billion investment, they have already made $1 billion on paper, right? What’s interesting is that they (NVIDIA) really need their customers to be deeply involved. So when potential customers get involved and commit to purchasing certain products, it makes a lot of sense.

And this is quite dramatic because Intel was previously sued for anti-competitive behavior, where Intel integrated graphics capabilities into the motherboard chipset instead of letting discrete graphics cards handle it. At that time, NVIDIA also received a settlement from Intel because of this, right? At that time, graphics processing had not completely separated from the GPU, and many were still on the motherboard chipset, which also included USB, IO, etc.

So now Intel wants to make chiplets and package them together with Nvidia's chiplets to create a PC product, which has a bit of a "wheel of fate" meaning. Now it has reversed; Intel seems to be "crawling" to seek cooperation with Nvidia, but this might actually be the best device on the market right now. I don't want an ARM laptop because it can't do many things. However, a laptop with an x86 architecture, paired with a fully integrated Nvidia graphics card, could be the best product on the market.

E: So are you optimistic about this cooperation? Do you think it will proceed smoothly?

D: Well, of course, I really hope it succeeds. To be honest, I am someone who always remains optimistic about Intel; I have to be optimistic (laughs). My previous expectation was that Intel and the government hoped to operate in a structured way, such as having customers and supply chain giants directly invest capital into Intel.

But this time it’s actually the opposite; Nvidia is investing by purchasing stocks and holding partial ownership, but it hasn't really diluted the shares of other shareholders. Other shareholders might be diluted when Intel goes to the capital market for financing afterward, but the good news is that since these cooperation projects have already been announced, it helps boost market confidence.

The investment amounts are actually relatively small. Nvidia is $5 billion, SoftBank is $2 billion, and the U.S. government is $10 billion... these amounts are not too large in the entire semiconductor industry, right? Last time I said Intel needs at least over $50 billion in funding, and now it seems that it will be a bit better when they go to the market for financing. Maybe they will announce a few more similar big deals; for example, there are many rumors that Trump is involved in this round of attracting corporate investments. Now that Nvidia has invested, and the government has also invested, will Apple be next? Will they also cooperate with Intel? Whoever comes in will further enhance investor confidence. Then Intel can go to the market to issue more shares or bond financing.

S: Just like the "Buffett effect," right? Jensen (Huang Renxun) is like the Buffett of the semiconductor industry (laughs).

S: Guido, you were the CTO of Intel's Data Center and AI Business Unit; what do you think about this?

G: I think this is great news for consumers, especially in the short term. Especially for the laptop market, having Intel and Nvidia cooperate is simply fantastic.

However, I am also wondering how Intel's internal graphics and AI products will develop. They might pause or simply abandon their existing solutions. Right now, they really don't have any competitive products left. For example, Gaudi F4 is basically a "dead project"; Their internal graphics chips have never truly succeeded in competing in the high-end market.

From this perspective, such cooperation is beneficial for both parties. To be honest, Intel really needs a breath of "fresh air" right now. They are quite desperate.

Impact on AMD and ARM

G: I think this is devastating for AMD. Just imagine—if your two arch-rivals suddenly join forces, that's the worst news you could hear. AMD has been struggling enough; their graphics card hardware is not bad, but their software stack is completely inadequate, and their market penetration is very low. Now there's an even bigger problem on the horizon.

As for ARM, I think they are also a bit "cooling off." ARM's biggest selling point has always been: we can collaborate with any company that "doesn't want to work with Intel." But now Nvidia might just become one of the most dangerous CPU competitors in the future, and Nvidia suddenly has access to Intel's technology, potentially even making strides in CPUs. This disrupts the entire game.

I never anticipated this turn of events; it's quite unexpected. But I think this development is really exciting.

Jensen Huang's Next Step: Nvidia's Strategy

S: Well, let's step back from the current news a bit, although there are indeed many things worth discussing right now. The last time you were on the show, we talked about Nvidia. You mentioned some possible future paths for Nvidia.

Can you tell us again about your bullish and bearish scenarios?

D: Their current stock price actually reflects a lot of expectations. But interestingly—Wall Street's major investment banks' capital expenditure expectations for "super cloud" companies are actually far lower than my estimates.

The so-called super cloud companies include Microsoft, CoreWeave, Amazon, Google, Oracle, and Meta—I consider these six as "super clouds," although strictly speaking, Oracle and Core weren't considered before, but now they are OpenAI's "super cloud providers."

The consensus expectation among investment banks is that these companies' total capital expenditure next year (2026) will be around $360 billion.

My own estimate is closer to $450 to $500 billion. This figure is based on our research on data centers, supply chains, and tracking each data center project individually.

G: Is that figure referring to Nvidia's expenditure?

D: No, it's the total capital expenditure of these super cloud companies. This expenditure will flow to many companies, but the vast majority will still go to Nvidia.

Nvidia's current situation is not about "capturing market share," but rather expanding alongside the entire market while maintaining its existing share

So the key question is: how quickly will the capital expenditures of these super cloud vendors and other users grow?

D (continued): The reason I consider Oracle and Core as super clouds is that they have become infrastructure providers for OpenAI. You can see this from Oracle's announcements.

I actually don't understand why many people don't realize how crazy this is — they did the most outrageous thing in the entire history of the stock market: Oracle announced a four-year performance guidance! This is unprecedented.

This directly made Larry Ellison the richest person in the world.

Also, OpenAI and Oracle signed a long-term contract worth up to $300 billion, the question is: can OpenAI really afford this amount? Including through financing and its own revenue growth.

This contract will reach an annual payment scale of $80 billion to $90 billion in the future, it just depends on whether you believe this growth rate can be achieved.

There are many predictions in the market now. For example:

Some believe that OpenAI's annual recurring revenue (ARR) will reach $35 billion by the end of next year (2026); some say $40 billion; others say $45 billion.

This year they are expected to achieve about $20 billion ARR. So if this growth trend continues, all these revenues and expenditures from financing will ultimately flow to computing power, attributed to Nvidia.

We saw in OpenAI's last round of financing that they showed financial forecasts to investors, indicating they would "burn" $15 billion next year — in reality, it might be close to $20 billion.

Adding this up, OpenAI will likely spend about $15 billion to $25 billion each year, and it may not achieve profitability until 2029.

And it's not just OpenAI; Anthropic, xAI, and all other AI labs are "burning money" like this.

So it is very likely that the total spending in the AI market could really exceed $500 billion — next year it won't be $360 billion, but $500 billion.

Nvidia's statement is even more aggressive: they say that future annual spending on AI infrastructure will reach "trillions of dollars per year," and they want to capture a large portion of it.

This is the bull case logic: AI is truly transformative, the world will be covered by data centers, and everything you interact with in your life is driven by AI — whether it's having an AI agent help you write code or chatting with your AI girlfriend Annie, all of this is powered by Nvidia.

G: I understand your bull case logic, but I think the core issue is whether "value creation" truly exists. Personally, I believe this value does exist. AI has the potential to create trillions of dollars in value

So the question becomes: Where will Nvidia's ultimate "ceiling" be?

D: That depends on whether you believe in the "takeoff" scenario.

If you believe in the so-called "AI explosion": powerful AI will give rise to even more powerful AI, which in turn creates even stronger AI... each generation of agents drives the economy further.

Imagine this: if you can hire a group of monkeys to work, and hire a group of people to work, the output will certainly be different. But what if you hire AI? The value creation could far exceed that of humans.

So from this perspective, value creation could be at the level of "hundreds of trillions of dollars."

Take a simple scenario: if we can double the "efficiency" of every white-collar worker with the help of AI, then that incremental value would already be "hundreds of billions of dollars," wouldn't it?

But the question is, what does "doubling efficiency" mean?

If you talk to people in the lab now, they will say that this is no longer just "efficiency improvement," but direct replacement!

It's AI that is "ten times better than humans."

If white-collar workers completely rely on the continuous generation of large language models (LLMs) to maintain productivity, then you could almost "tax" every knowledge worker in the world, because their work is running on AI.

And knowledge workers make up the majority of jobs in the world.

S: So go ahead and guess a number: How much further can Nvidia rise?

D: I don't know, so why don't we create a "matrioska brain"? (Note: a hypothetical massive supercomputing system built around a star)

Maybe the machines will first say: "Humans no longer need to exist, I want more computing power."

G: Wait, that at least has to wait until humans colonize Mars, right?

D: Right, TBD (to be determined). Honestly, I think the speed of change now makes it very difficult for us to predict anything beyond five years.

Linear time is already too far away. Let's leave it to economists to predict.

I focus on some more "grounded" things: like the supply chain, which I can still see three or four years ahead. But after the fifth year... then we really can only rely on "YOLO" (you only live once).

So I try to keep myself focused on the supply chain, and look at the adoption speed of AI, value creation, usage, and other quantifiable things.

Beyond that... for example, are we all going to connect to BCI (brain-computer interface)? I don't know.

Are humanoid robots coming? Look at what Musk said recently, he said "Tesla is worth ten trillion dollars because of humanoid robots." So if all of this relies on Nvidia for training...

Then Nvidia is worth ten trillion dollars too, right?

I don't know, it's too sci-fi. I'm not interested in that kind of discussion

S: You're absolutely right... Everyone should read more science fiction novels.

Nvidia's Moat: How They Built It

S: So, let's continue along the line you just mentioned. You casually said, "Nvidia's market share is already so high that it's basically impossible to grow further," right? We talked about Nvidia's moat last time, and this is clearly closely related to maintaining a high market share.

I really liked the development story of Huawei that you just mentioned. Can you also take us through how Nvidia gradually built its moat?

D: This process is actually particularly exciting because, as you know, they failed many times at the beginning and risked the entire company multiple times. Jensen (Jensen Huang) is the kind of person who is crazy enough to bet the whole company.

For example, they would place large orders before the chips had even been successfully tested, putting all the money on the company's books into it, or they would prepare goods before the project had even been awarded. I've heard a "rumor"—though it's not really a rumor, it's something said by a very experienced senior in the industry who should know the inside story. He said that Nvidia had already placed purchase orders before they even got the Xbox order from Microsoft. In other words, Jensen Huang is the kind of person who says, "Whatever, let's just go for it, YOLO (You Only Live Once)."

Of course, I believe there are some details here, like Microsoft might have given them a verbal intention, but the timing of their order was indeed before the confirmation of the order.

Another example is during the cryptocurrency bubble period—there were several rounds of this bubble, right? Nvidia desperately convinced the entire supply chain that this was not demand supported by cryptocurrency, but rather "real demand" from gaming, data centers, and professional graphics workstations, so you all need to expand production capacity.

So everyone really went for it, pouring a lot of capital expenditure into expanding production and opening new production lines. They were priced per unit, Nvidia bought the chips, sold them, and made a huge profit.

Then when the bubble burst, Nvidia only needed to write off one quarter's worth of inventory. But what about the others? Those production lines were all idle.

What did AMD do at that time? Actually, their chips were more efficient in cryptocurrency mining, with better computational efficiency per unit silicon area, but they chose to act rationally and did not expand production capacity. Their logic was, "We don't want to bet on this bubble."

So it's like the question of whether to strike while the iron is hot. Nvidia did it, and they succeeded.

Recently, a similar situation occurred. Most people didn't believe the capacity orders they placed, for example, their forecast for Microsoft was even higher than Microsoft's own internal expectations.

Microsoft thought, "We don't need this much, do we?" But Nvidia insisted, "No, you will need this much." They even placed non-cancelable, non-returnable orders (NCNR), which is a serious matter in the supply chain—you can't cancel or return it

I remember I once asked this question in Taiwan, when CFO Colette and CEO Jensen were both present. The audience was mostly finance professionals, and the questions were quite boring, plus there were only three days left until the earnings report, so they couldn't answer much, as the SEC wouldn't allow it.

Then I asked this question: Jensen, you are the kind of person who relies heavily on "intuition" and has great foresight, while Colette, as CFO, must pay special attention to numbers. How do you two collaborate with such different styles?

Jensen replied, "I hate spreadsheets (Excel). I don't look at them at all. I just 'know.'"

So, the most powerful innovators often have a strong intuition. For example, he can instinctively judge when to place orders in advance, even if those orders might ultimately need to be written off. Historically, they have indeed had many instances of write-offs, accumulating losses of several billion dollars.

Yes, some might say that amount of money isn't much. But it depends on how you measure it. When the Bitcoin bubble burst, their inventory write-offs amounted to several billion dollars, and at that time, Nvidia's market value was less than $100 billion, so it was indeed a significant issue back then.

However, compared to the money they made later, those losses seem like small bets for big gains.

I think everything they did at that time was right, while AMD's choices were wrong.

Think about it, the semiconductor industry is cyclical, and many companies go bankrupt before they can survive a cycle, which is why the industry constantly sees consolidation.

From a risk-reward perspective, Nvidia's bets were definitely worth it.

G: Yes, but from another perspective, for example, if you are a CEO and you want to provide Wall Street with a stable and predictable quarterly report, then these actions seem too aggressive.

This might be the reason for the tension that could be occurring internally now.

D: Right, we also made a video, edited in the style of Lee Kuan Yew's speeches, with music, and the final scene features Jensen Huang. He said, "The goal of playing a game is to win, and the reason you win is so you can keep playing."

He compares life to a pinball machine: if you win, you can continue to the next round.

For him, everything is for the "next generation." Not the world 15 years from now, because everything will change in 5 years. So his focus is always on "now + the next generation," and everything else is unimportant.

S: Yes, from a risk-reward perspective, he really did bet right.

D: Yeah, and almost no one dares to bet like that. They are the only "young" semiconductor company with a market value exceeding $10 billion.

You see, MediaTek was founded in the early 1990s, and so was Nvidia. Other giants were mostly established in the 1970s.

G: Right, most large companies are indeed older.

D: So, this makes him even more admirable.

How Jensen (Jensen Huang) Has Changed Over the Years

S: I think one point you just mentioned is particularly good—he is the kind of person who "bets the entire company." And, as you said, he has actually made a few wrong bets. For example, in the mobile business, right? What exactly happened there? (Mobile, right? Like what the hell happened with mobile?) It failed. But he still continues to bet like that.

I remember Mark once had a conversation with Eric about "founder-led" companies—you will always remember the experience of taking huge risks to build the company from scratch. But if you are a CEO who came in later, often your task is just to "follow the rules and maintain the status quo."

But in Jensen's case, he remembers all those moments when they almost went bankrupt. So he would say, "I still have to keep betting, just like back in the day."

So what do you think, how has he changed over the years? After all, he is now one of the longest-serving CEOs—over thirty years, almost catching up to Larry Ellison. What changes do you think he has gone through in these thirty years?

D: Uh, I mean, obviously, I'm only 29 years old, I have no idea what he was like back then (laughs).

I can only say—I’ve watched a lot of his old interviews, hahaha.

S: Right, you weren't even born when he became CEO (laughs).

D: Exactly, exactly. I wasn't born when Nvidia was founded. I was born in 1996 (laughs).

S: So can you talk about your observations of him in recent years?

D: Yeah, I think even from the old videos I've seen, his changes are quite obvious.

His overall "aura" and "style" have become more charismatic and stylish.

(he's just like sauced up and dripped up)

His personal charm has clearly increased. Although he was already quite charming, now he completely looks like a "rock star."

And he was already a "rockstar" ten years ago, it’s just that people might not have seen it back then.

I remember the first time I watched his live speech in full was at CES (Consumer Electronics Show)—it should have been in 2014 or 2015.

At that time, he was on stage talking about AI—things like AlexNet, self-driving cars, all AI-related content.

But you know, that was a consumer electronics show, and I was still a teenager, hanging out in Reddit gaming hardware forums, I just thought—

"Can you figure out who your audience is first? We came here to hear about gaming GPUs!"

I was half thinking, "Wow, this is so impressive." The other half was thinking, "Can you hurry up and release the new graphics card!"

Then you look at the forum, and everyone's reaction is——

"What is this? I want news about graphics cards!"

"Nvidia is going to be a price assassin again."

They have always adhered to the strategy of "we price based on value, and we add a little more because we are smart enough."

I guess Jensen Huang really relies on intuition to set prices, especially in the gaming card segment.

He might change the price right up until the moment of the press conference.

So this is basically an operation based on "feel."

But do you think he lacks that intuition? Definitely not.

It's just that many people at that time would think:

"Ah, Jensen is wrong, what does he know?"

But now, as soon as he speaks——everyone is:

"The master has spoken, silence."

So maybe it's just because he has "bet right" too many times over the years that people have started to believe he really knows what he's talking about.

As you can see, this conversation is more about "the process of Jensen Huang transforming from a startup CEO to a symbolic entrepreneur of the AI era," including his style, intuition, charisma, and the contrast with his early years.

Jensen Huang's Leadership and Company Culture

S: Yes. Recently there was a post on X saying that he (Jensen) has ascended to the "god mode" among that small group of CEOs, but this post is asking: "Who exactly is the god? Who are the other gods?"

Uh, it's Zuck. Uh, who are the other gods? Elon. Elon, Elon, Zuck, and Jensen. Not bad. A good combination.

Well, a team of gods together. So we pray to Silicon Valley. Right, doesn't it feel a bit like that now?

Absolutely. Uh, another question, people mentioned his CFO, Colette. Uh, you know, Nvidia has a very loyal team, even though all the OGs can retire now. Uh, is there anyone at Nvidia now similar to Gwyn Shotwell at SpaceX, or the core figure like Tim Cook was to Steve Jobs at Apple?

D: I mean, he (Jensen) has two co-founders, right? That's not to be overlooked.

Uh, one of them, you know, hasn't been involved for a long time, but the other was involved until a few years ago. Right?

So it's not all Jensen steering the ship, is it?

Not at all. Uh, although he does lead everything.

There are several people in hardware. I always——uh, there's one person who is almost legendary to me within Nvidia. You talk to the engineering team, he has led many engineering teams

He is a very low-key person, so I actually don't want to mention his name. Fairly speaking. But you know, his role is actually similar to that of a Chief Engineering Officer. People in his organization would know who he is.

I do think there are indeed these types of people.

But you know he is very loyal to Nvidia. And there are quite a few of this type of person. There's another person who would say, "We have to push this silicon out immediately, we can cut features if needed."

That person became famous for this. All the technical staff inside Nvidia hate him.

This is another very loyal Nvidia person who has been with the company for a long time.

But you know, when you have such a visionary company and are moving forward, one issue is: you can get bogged down by these details, right? You might say, "Oh, I need to finish this, it has to be perfect, absolutely amazing."

It's people like that, but these people are obviously very close to Jensen because Jensen himself believes in these things, right? He believes in having vision, looking to the future, but at the same time would say, "Forget it, let's cut features and ship the next generation tomorrow," right? Like, "Ship now," "Ship a bit faster."

In the field of silicon, this is really hard to do.

And Nvidia has always been impressive from the earliest days. He has talked before about their first successful chip, when they were running out of money and he had to seek funding from others to complete the development. Even then, they barely had enough money. Because they had already had failed chips before. That chip had to come back working, or else it was game over.

At that time, due to financial constraints, they could only afford the so-called mask set costs. Basically, it was about putting these stencils into the photolithography machine, telling it where the patterns are, and then doing etching, depositing materials, etching again, depositing again, and etching again, stacking layer by layer at those positions to make a chip.

These stencils are customized for each chip, right? The cost is now in the scale of billions, tens of billions of dollars. Uh, but even back then, although they didn't have that much money, it was still a lot of money.

They could only make one set of mask sets at that time. But the norm in semiconductor manufacturing is to do as much simulation and verification as possible, but you will send the design out, and it will always need to be modified. There will always be issues. You have to bring the design back for modifications. You have to change some things. Because simulation can never be completely accurate.

And Nvidia's characteristic is that they often get it right the first time.

Yes. Even companies like AMD or Broadcom or other highly efficient companies often have to release an "A version" and then an A followed by a number, or B followed by a number. "A" is the transistor layer, and the numerical part is the metal layer that connects all the transistors

Nvidia will first complete and mass-produce the "A" layer, which is the transistor layer, and then pause a bit before transitioning to the metal layer, in case they really need to modify the metal layer.

So once they confirm it works, they can quickly ramp up production, while other companies often go through "Oh, the chip is out, oh no, version Z doesn't work, we need to tweak," and then "Oh, A doesn't work, let's change it, and take it back...". This is called stepping, right?

We were all very envious of Nvidia at that time. They almost always deliver on the first try. We just can't do that. The data center CPU team had a product, I remember we had A1, A, Z, round after round of changes to the transistor layer, all the way to E2—there were already a lot of versions.

When AMD's market share rapidly increased and was catching up to Intel, Intel was at the E2 stage—15 rounds of version stepping, which led to a much slower market share compared to others. Because each stepping would delay a quarter, right? That was catastrophic.

So this is another thing about Nvidia: they have a mentality of "who cares, just ship it," "get it out as quickly as possible," "make it happen quickly," "don't let unnecessary features delay it," ensuring they don't need to change versions so they can respond to the market quickly.

Regarding Volta, that was Nvidia's first chip with tensor cores. You know, in the P100 Pascal generation, they saw all the AI developments. They decided to fully invest in AI, and just months before sending it to fab, they added tensor cores and decided "who cares"—they changed the architecture.

If they hadn't done that at the time, maybe someone else would have seized the market leadership in AI chips. Right? So there are many moments like this. They made big changes, but often there are also many small tweaks, like digital formats or certain architectural details. Nvidia is very fast in these aspects.

G: Another crazy thing is that their software team can keep up with all of this. Right? I mean, if you just released a chip and can go to market without any need for stepping, then you must have the drivers ready, as well as all the infrastructure above that—this is very impressive.

Nvidia's Future: Cash, Data Centers, and AI Infrastructure

S: Yes, I like the point you just made because think about it, Nvidia is repeatedly propelled by tailwinds, but I think both of you are saying—you have to run fast enough, execute well enough, and leverage these tailwinds. Uh, and if you think about... by the way, I love the CES story you told I can only imagine him talking about self-driving cars there more than a decade ago. Uh, but you know, if you're thinking about nailing down the tailwind of video games, VR, Bitcoin mining, and obviously AI now. Uh, you know, one of the things Jensen talked about today is robots, AI factories. Uh, maybe my last question about Nvidia is, what do you think the next ten to fifteen years will look like? Uh, I know it's hard to say what things will be like five years out. Uh, but what will Nvidia's business grow into?

D: That's a real question, and I ask this question every time I talk to some executives at Nvidia because I really want to know. You know, they obviously won't answer completely, but the question is: what are you going to do with your balance sheet? You're one of the companies with the highest cash flow, you have so much cash flow. Uh, now the hyperscalers are compressing their cash flow because they're investing heavily in GPUs. Uh, what are you going to do with that cash flow? Right? Even before this "takeoff," he (Jensen) wasn't allowed to acquire ARM, right? Uh, so what are you going to do with all that capital and all that cash? Right?

Even though Nvidia recently announced a $5 billion investment in Intel, there is regulatory scrutiny, right? Uh, it says in the announcement, "This is subject to review," right? It's like, um, I think that will ultimately go through, but he can't buy something very large. He needs to have hundreds of billions of dollars in cash on the balance sheet. What do you do? Start building AI infrastructure and data centers yourself? Maybe. Uh, but if others can do that, why would you do it yourself?

He is indeed investing in these, right? But they are all small investments, right? You know, like he recently supported CoreBackstop because it's hard to find a lot of GPUs for burst capacity, right? Just saying, "Hey, I want to train a model for three months, I don't have enough base capacity, I don't know the experimental results, but I want to train a large model for three months." We know from our portfolio that it's like, um, yes.

So Nvidia sees this problem, and they think it's a real issue faced by startups. That's also why various labs have such a big advantage. Uh, but if I could… you know, most Silicon Valley companies spend 75% of their budget on GPUs in a funding round, right? Or at least… yes. In the CD (Compute Domain?). If you can burn 75% in a single model run over three months, right? You know, yes. Uh, and really scale up, have some competitive product, then you have this model, and then you go for funding, or start deploying, right? Uh, what do you do after you get it?

Is it about starting to buy a bunch of humanoid robots and then deploying them? But they haven't really made good software for these robots. They do well below the model layer, uh, where they deploy capital is a problem. They have been investing a bit up and down the supply chain, right? Investing in those new cloud companies (neoclouds), investing in some model training companies. Yes, but again, this is small money. If he wanted, he could certainly participate in the fundraising round of the entire AI company, but he hasn't done that, right? Then really let them use GPUs, or he could participate in rounds like OpenAI, he could fully engage in those rounds. Do you think these are things he should do? I mean... good question. Um, I don't know. Right?

I think he might reference your question in the next fundraising round, we'll talk about it again. Ah, anyway, he might really kill the venture capital industry, taking the best fundraising rounds and doing big business. Yes, you know, you can do seed rounds and then let Jensen mark up the value. That's why...

No, I don't think... I don't like him going in the direction of "picking winners" because he has all sorts of clients across the ecosystem. If he starts picking winners, his clients might become more anxious because they would think, "Hey, if you start favoring a certain company, I have to consider AMD, or some startup, or doing it in-house." Right? Buying TPUs (Tensor Processing Units) or whatever. You know, people will worry. He can't just invest in these, you know. He can do a bit, yes, a couple of hundred million for rounds like OpenAI or XAI is fine. Right, CoreWeave is like that, right? Yes. Everyone is arguing about this. But he invested a couple of hundred million, plus rented a cluster for internal development purposes, instead of renting from super cloud service providers, which is more cost-effective for Nvidia, right? Cheaper than letting cloud service providers rent from them. That's it.

Or is he really supporting so many CoreWeave? Or other clients or neoclouds? There are some investments, but more like "this cloud company is good, let's invest 5% or 10% in this round." Right? It's not like he took more than 50% of that fundraising round.

S: Is he also reshaping his market? I mean, a few years ago there were only four or five big orders for these GPU cards, and you just listed six. Is this level strategic?

D: Yes, yes, I absolutely think so. He doesn't need to invest a lot of capital to do this. For example, did he place an order with Chip One (a certain chip company) before others? I don't know. Yes, that's not the point. But if you look at the total capital he spends on neoclouds, it's several billion, but if he wants, he has many other levers to use Yes. Allocations are just as you mentioned.

D: One good thing is, you know, previously he offered large volume discounts to super cloud service providers. But now, because he can use antitrust as a reason, he says, "Everyone has the same price." Well, that's very fair, very fair, right?

D: So what should he do with this capital? Or what can guide his capital utilization? I mean, I think, you know, some people would say he should invest in data centers, just invest in the data center layer, rather than the cloud layer or the services running inside the data center. This way, more people build data centers, and if market demand continues to grow, data centers and power won't become bottlenecks, right? Invest in data centers and power.

I told them they should invest in data centers and power, rather than the cloud layer, because the cloud layer is still relatively easy to be affected by competition, is complementary and somewhat commoditized. I wouldn't say the cloud is commoditized, but it does have a lot of competitors doing well now. Well, and you also educate commercial real estate and other infrastructure investment companies to enter the AI infrastructure space. So I don't think you should invest in the cloud layer, right? Well, is it investing in data centers and power? Yes.

D: Is the investment because that is the real bottleneck limiting your growth? Well, yes, first is how much people want to spend and how much they can spend, and second is whether they can put them in data centers. Oh, and then in areas like robotics, I think there are places he can invest, but nothing requires $300 billion in capital to do. So what do you do with this capital? I really don't know. I really think Jensen must have some kind of idea, some kind of visionary plan, because that shapes the company, right?

They can continue... you know, I mentioned free cash flow of $20 billion to $25 billion a year. What do they do with this money? Just keep buying back stock? Go down the path like Apple? The reason Apple hasn't done anything interesting in the past decade is that their leaders no longer have vision. Tim Cook is great at supply chain. But they just spend money on stock buybacks. They really haven't succeeded in autonomous vehicles. Well, let's see how AR/VR develops. Well, let's see how wearables go, right? But companies like Meta and OpenAI might be better than them, we'll see...

D: So what he invests in, I have no idea. But what requires so much capital is a puzzle, and it's harder to find things that can really generate returns. Because the easy thing is the cost of equity—I'll just keep buying back stock, not changing the company culture. I think that's a problem too, right? Suddenly you invest in a lot of things, and the company starts doing two or three completely different things, which is very hard to sustain. But they are indeed doing a lot of different things. Right? I mean, on one hand, you say you're building AI infrastructure, and on the other hand, you say humanoid robots are everywhere, which can also be considered AI infrastructure, or that data centers and power are AI infrastructure, right? You know, humanoid robots are completely feasible, but if you suddenly need to handle concrete and build power plants, that's a completely different culture, a completely different crowd, and it's much harder.

G: And look, remember when we were at Intel, one of our biggest problems was our customer base was terrible. Right? Yes. I mean, we sold most of our chips to super cloud service providers, and they were too concentrated; they made their own chips, and then you got squeezed on prices. So honestly, you should have used the funds to diversify the customer base back in 2014, set the prices a bit higher, and let the profit margins reach 80%. What would the world have done at that time? Nothing. The profit margins were pretty good then. That wasn't the issue. The main issue was that the profit margins were 60, 65, now up to 80. Right. Oh my gosh.

Super Cloud Service Providers: Amazon, Oracle, and the Cloud Wars

S: Okay, wait a minute. I think Guido's comment is actually a very good segue that can take us into another topic we want to discuss with you, which is super cloud service providers (hyperscalers). Uh, one of the reasons I like to look at semi-analysis is that you make those judgments that go beyond consensus, and you are often right. Uh, one of the recent judgments is... often it's like that. But you (Jensen's hit rate) are very high. Uh, but what attracts me the most is Amazon's AI resurgence. Uh, so I want to talk to you a bit about this because, uh, I think we find it quite interesting; we help our portfolio companies on the front lines in selecting their partners. Uh, we have some micro data, but you can first talk about why you think they are lagging behind.

D: Yes. So in the first quarter of 2023, I wrote an article called "Amazon's Cloud Crisis." Uh, it was about these neoclouds commoditizing Amazon. Uh, it said that Amazon's entire infrastructure was very good in the previous generation of computing, right? The elastic fabric, ENA, and EFA they built, right? Their NIC (network interface card), protocol, and their custom CPUs, etc., right? These were very suitable in the previous generation of "scale-out computing," rather than this "scale-up" AI infrastructure era.

And neoclouds want to commoditize them; their silicon teams are more focused on cost optimization, and today the game is "max performance per cost." Uh, this usually means that even if costs double, you need to increase performance even more, like three times, because that way the performance-to-cost ratio still declines (meaning the cost/performance ratio is better) Uh, this is the game rule for Nvidia hardware today.

It turns out this judgment was very good. Many people said we were wrong at the time because Amazon was seen as one of the best stocks, Microsoft hadn't really taken off yet, and Oracle and others weren't doing much either. Since then, Amazon has been one of the worst-performing super cloud service providers. The opportunity is that Amazon still has structural issues, right? They are still using elastic fabric, although this is improving. Uh, they are still lagging behind Nvidia in networking and behind Broadcom's kind of networking hardware, as well as Arista-type networks, NYX (possibly referring to some kind of network hardware or protocol). Uh, their internal AI chips are okay, but mainly they are now starting to wake up, actually beginning to capture business.

The main content of this judgment is that since that report, AWS's year-over-year revenue growth has been decelerating, and revenue has been continuously declining. Our significant judgment is that it will re-accelerate, right? That's because Enthropic (possibly referring to Enthropy or some data center/infrastructure company we track) is because we have done a lot of work in data centers, right? Tracking when each data center goes online and what is inside. When this goes online, through the flow through on cost, if you know the chip costs, network costs, power costs, uh, you generally know the profit margins of these things, and then you can start estimating revenue.

So, when we put all these things together, it is very clear to us that AWS's revenue growth will hit a trough this quarter, compared to year-over-year benchmarks, this will be the lowest point for at least the next year. Right? Uh, then it will re-accelerate to a year-over-year growth rate exceeding 20%. Hmm, because these large data centers, they are already online, with Cranium (perhaps an internal or third-party infrastructure name) and GPUs, uh, it depends on which customer.

The experience is that not all customers are as good as some, like Core (possibly referring to CoreWeave or CoreLabs or similar companies). But the game rule now is capacity. Uh, Core can only deploy so much. They have limited data center capacity, although the construction speed is very fast. But the company with the most data center capacity in the world, and even today, although others may catch up in the next two years, based on what we see, is still Amazon. In fact, Amazon still has the most spare data center capacity, and this capacity will ramp into AI revenue over the next year

I have a question: are these capacities the right type of capacity? Because for today's high-density AI build-outs, you need a very strong cooling system. You need to have sufficient water sources nearby, and you need enough power supply, right? Are these data center capacities in the right locations, or is the type wrong?

So in this sense, from power assurance to substations, to transformers, to being able to provide power to the racks through power whips, clearly all these aspects need to be secured. Data center capacities will vary, right? Uh, you know, historically, Amazon actually has the highest density data centers in the world, right? Uh, they were already achieving 40 kW racks when others were still at 12 kW. If you ever walk into most data centers, you would feel quite cool and dry. If you walk into Amazon's data center, you would feel like a swamp. It feels like where I grew up, right? It's humid and hot. Because they are optimizing every percentage. So what you mean is that Amazon's data centers are not equipped with the infrastructure for this new type of setup, but compared to the cost of GPUs, like complex cooling arrangements, this is actually a minor issue, right?

Uh, you know, we made a judgment on Aera Labs a few months ago when they were valued at 90, and then the next month, because Amazon placed an order with them, that valuation rose to 250. Uh, because Amazon placed an order with them. But there are some aspects of Amazon's infrastructure that I won't go into detail about, but the rack infrastructure really requires a lot of Sterolabs' connection products. Uh, the cooling aspect is the same, needing more of these things. But again, these things are not significant costs compared to GPUs. Uh, you know, you can build, right?

My question is more like this: look, I might need a big river nearby for cooling in many places right now, right? In many areas, I can't get enough water at all. You know, maybe the power situation is the same in those areas.

They have two sites at the "2 gigawatt" level, where all the power is set up, and power is secured. Wet chillers and dry chillers are all ready. Everything is fine. It's just that the efficiency isn't that high, but you know, that's okay, right? They will ramp up revenue, they will increase income. Uh, it's not that I necessarily think Amazon's internal models are excellent, or that their internal products are more competitive than Nvidia's TPU, or that their hardware architecture is the best. I don't necessarily think that

Uh, but they can build a lot of data centers and fill them up, rent them out, right? This is actually a fairly simple assumption (thesis).

The Era of Super Data Centers

S: "How important is Enthropic (or are you referring to Trēnium / TranAIum?) in co-designing with Trēnium? Because I remember we had a portfolio company that was invited to AWS in the summer of 2023. They spent a week, about eight hours in total, trying to figure out Trēnium, and it was almost impossible to understand at that time. Well, this company obviously didn't go back to try again afterward. But so far, how different is that experience from what you've heard? Or is it still terrible? Just like that? Well, got it."

D: "You know, it is indeed quite difficult to use. Uh, so there's an argument that every company doing inference, including AI hardware startups, will say: 'I can run at most three or four models, I can completely optimize everything manually, write kernels for everything, even down to the assembly level, how can it be that hard?' It is quite hard. It really is. Uh, but often you do this for production inference. Like, you wouldn't use those ease-of-use libraries, like Nvidia's cuDNN, which makes it very easy to generate kernels, right? You wouldn't... or you wouldn't use these ease-of-use libraries. When you're running inference, either... you're using cutlass, or writing PTX yourself, or in some cases, someone even drops down to SAS (possibly referring to lower-level assembly or hardware near-level). Uh, for example, when you look at OpenAI or Anthropic, when they're running inference on GPUs, that's how they do it. Uh, once you get down to that level, the ecosystem isn't that great. It's not that using Nvidia GPUs is easy now. I mean, because you have an intuitive understanding of the hardware architecture, and many people work on this, you can communicate with others. But ultimately, it's not easy. Right? And you know, when Anthropic uses TPUs or TranAIum, some architectures are simpler than GPUs. Uh, the cores are larger and simpler, rather than having all these features. Um, you know, less general, so writing code is a bit simpler. Um, there are people in Enthropic (or Trēnium) on Twitter saying that when they do that low-level stuff, because it's simpler, they actually prefer to work on Trēnium or TPUs

"What needs to be clarified now is that, especially with Trēnium, it is indeed very difficult to use — it is not for the faint of heart. Well, it is very difficult, but if you are just running a model, if I were at Anthropic and I had to run Claude 4.1, Opus, or Sonnet, then forget it, I wouldn't even run HighQ, I would just run it on a GPU or whatever device, right? I would just run two models. In fact, forget it, I would also run Opus on the GPU, and also use TPUs. Sonnet is most of my traffic. This way I can spend time working on it."

"Does the architecture change every four to six months? Right? To be honest, sometimes it doesn't change that much. I think there was definitely a change from 3 to 4. Well, I mean defining architectural change, you know, the basic primitives at a high level haven't changed that much between the past few generations. Honestly, I don't have a deep understanding of the model architecture over at Anthropic. But I feel like from what I've seen elsewhere, there is enough variation that you need to spend time on this step of the program. Well, and the real key is, if I were at Anthropic, I now have AR (Annual Recurring Revenue or similar metrics) of 7 billion or more than 10, and by the end of next year it might be 20 or 30, right? My profit margin is 50% to 70%. Well, then I need the investment in Trēnium (or similar hardware), right? That I can run on Sonnet, and most use cases are using Sonnet-3, 5 or Sonnet-4, 5 models. Right? Then I can spend time, that hardware is runnable. Yes, it is completely possible."

Hardware Cycle: GB200, Blackwell, and Next Generation

S: Maybe talk about some of the non-consensus calls you made, and then I will talk about another cloud thing. Uh, back in June, you said Oracle would win the AI computing market. And in this podcast, we have already mentioned that Oracle made a big jump, obviously this is the largest single-day increase ever achieved by a company with a market value of over 500 billion dollars. I think, in the first quarter of 2023, didn't Nvidia have a bigger increase? Maybe that one was a bit bigger. Well, maybe it was a little smaller than that one. Well, I think it's about the same. We will check the facts. This is really amazing. But, uh, you know, obviously this is a big commitment. That said, can you walk us through why you made that judgment and why you think Oracle can do so well in such a competitive field?

D: Yes, so Oracle is one of the companies in the industry with the largest balance sheets, and they are not dogmatic about any particular type of hardware, right? Uh, they are not fixated on any specific networking technology. They will deploy Ethernet with Arista. They will use their own white box networking devices to deploy Ethernet. They will deploy Nvidia's networking, uh, Infiniband or Spectrum X. They have very good network engineers. Their software overall is also very outstanding. Right? For example, Cluster Max—they are gold certified for Cluster Max because their software is strong. There are a few things they need to add to make them stronger, and they are adding those features, right? Uh, upgrading to platinum level, right, that's the level where Core is located.

So when you combine these two things, right? For example, OpenAI's computing needs are insane. Uh, Microsoft has concerns on this side. Uh, they are reluctant to invest because they do not believe OpenAI can really pay that amount of money. Right? I mentioned earlier, that $300 billion contract, OpenAI, you do not have $300 billion, but Oracle is willing to take that bet. Of course, this bet has a bit of a safety net because what Oracle really needs is to secure the data center capacity. Right? So, that's how we made that judgment initially. We have been informing our institutional clients, especially hyperscalers, AI labs, semiconductor companies, or investors in the data center model, because we are tracking every data center in the world. Uh, Oracle itself does not completely build all the data centers, right? By the way, they co-engineer with other companies, but they do not physically build all of them themselves. They are very flexible in evaluating new data centers and designing them.

So we have seen many different data centers. Oracle is aggressively negotiating for space, signing contracts, etc. So we have, "Uh, here a gigawatt, there a gigawatt, another gigawatt," right? Uh, Abilene, uh, two gigawatts, right? You know, they have so many sites signing contracts and in discussions, we are paying attention to these. Then we have timelines because we are tracking the entire supply chain. We are tracking all permits, regulatory filings, you know, using language models, satellite photos continuously, and also chillers, transformer equipment, generators, and other supply chains

We can make fairly strong estimates, quarter by quarter, of how many power facilities will come online in each of these sites, right? So, some of these known sites won't ramp up until 2027, uh, but we know Oracle has already signed contracts. Right? Uh, we know the signed contracts, and then we know the ramp path. So the question becomes, for example, if you have a megawatt, for simplicity, that's a lot of power, but now in the gigawatt era, that's not too much. But if you say megawatt, right? How much does it cost to fill that megawatt with GPUs? Right? Or it's actually simpler to calculate if I say something like a GB200 (or you say GV200?), right? Each GPU is one watt, but when you talk about the whole system (the entire system, including CPU and other parts), it's about 2000 watts. And you understand, um, the capital expenditure (CAPEX) for each GPU is $50,000, right? The cost of the GPU is not just the GPU itself. There are all the peripheral devices, right? Uh, so $50,000 CAPEX corresponds to 2000 watts. So in 10,000 watts, one GPU is $25,000, right? Uh, then what is the rental price for the GPU? Uh, if you sign a very long-term deal, bulk pricing, around $270 (per unit?), in the $260 range. Hmm, then you ultimately get: “Oh, the cost to rent one megawatt is about $12 million.” Yeah. Uh, and then each chip is different. So we track each chip, what its capital expenditure is, what the network equipment is. So you know what each chip is like, you can predict which chips will be installed in which data centers, when those data centers will come online, and how many megawatts will be added each quarter. Then you get “Ah, the Stargate site will come online during this time period. They will start renting at a certain time. How many chips does each Stargate site use?” Right? Uh, so therefore, this is how much OpenAI will spend to rent it. Then you break all this down, and we can predict Oracle's revenue with a high degree of certainty, and we are very close to their revenue forecast announcements for fiscal years 25, 26, and 27, right? We are also very close for 28. What surprises us is that they announced some sites for which we haven't yet found data centers for 28 and 29, but we will find them, of course So with this methodology, you can see, "Hey, which data centers have you signed, how much power do you have, how many contracts have you signed, and how much incremental revenue will it bring after going live?" This is the basis for our bet on Oracle. Uh, obviously, the details we disclose in our newsletter are much less, but you know, it's this judgment, right — "Hey, they have this capacity, they want to sign these contracts."

In our newsletter, we talked about two main things. We discussed OpenAI's business and ByteDance's business. Uh, and we expect an announcement about TikTok tomorrow, uh, on Friday, and so on. But on the ByteDance side, Oracle will also lease a large amount of data center capacity to ByteDance, right? So we use the same methodology there. Uh, you know, for ByteDance, they are a profitable company, so they are very likely to pay. For OpenAI, it's not so certain. Therefore, when you look at the forecasts for the later years, for example, 2028, 2029, 2030, whether OpenAI will exist and whether they can afford the more than $80 billion/year contract fees they signed with Oracle, there are error bars here.

If that happens, then Oracle's downside risk is also partially protected because they only sign data center contracts, and that's just a small part of the cost, right? GPUs are everything (the main part). GPUs are what they bought one or two quarters before they started leasing. Right? So they are not saying that if the contracts don't go through, they are stuck with a bunch of purchased assets that become useless assets. Yes, yes.

There's another angle: OpenAI and Microsoft used to be exclusive compute providers, right? Then they submitted certain documents saying they wanted to diversify, which pushed them towards other providers. Yes, so Microsoft used to be the exclusive compute provider, then reorganized into having a right of first refusal. Uh, you know, then Microsoft, are you the last choice or something? No, no, still a right of first refusal, but like Microsoft, these two are not mutually exclusive. Hmm, if OpenAI says, "We want to sign a contract for $80 billion a year or $300 billion over the next five years, do you want it?" They (the other party) might say, "No, you listen, okay, no problem." Right? Just like that, and then they go to Oracle. Right? OpenAI is like, "We need a balance sheet that can really pay the bills." Right? Because then they can make a lot of money from OpenAI, whether in compute, infrastructure, and all these other things But someone has to have a balance sheet. Well, OpenAI doesn't. Oracle does. Uh, despite the scale they signed, we also have another source of information that they are negotiating with the debt markets because Oracle actually needs to finance these GPUs over time through debt. They might be able to pay for everything with their own cash this year or next, but by 2027, 2028, and 2029, they will start relying on debt to pay for these GPUs. This is something Core (possibly referring to CoreWeave or similar cloud infrastructure companies) has done, and many neoclouds are debt-financed.

Even Meta, they financed their large data center in Louisiana through a mixed debt approach (wet and gut debt / or mix debt). It's not just because it's cheap, but financially it's really better than using cash to buy back stock because the return on debt is cheaper than equity. It's like a kind of financial engineering. But you know, who would be in this? Maybe Amazon, maybe Google, maybe Microsoft—a very short list—or Oracle or Meta, right? Meta obviously can't. Microsoft has given up. Amazon, Google, and Oracle, right? That's all that's left. Google would be in an awkward position. Yes, Google would be an awkward partner. Amazon would be a good partner, but you know, that's it.

xAI and Colossus 2

S: Well, I think, maybe since we are talking about the construction of these super-large data centers, you just released an article about xAI and Colossus 2. Are you no longer as shocked by these behemoths that can be built in six months? Or do you still think it's quite impressive?

D: You know, I often say that AI researchers are the first to think about problems in terms of "orders of magnitude." Before this, humans usually thought in terms of "percentage growth"—that's been the case since industrialization. And going further back, humans basically only considered "absolute values."

In other words, human thinking is evolving because the speed of change has accelerated, and everything has turned into "exponential" change.

For example, when GPT-2 was trained using so many chips, that was really shocking; then GPT-3 was trained on a larger scale system. You know, for example, 20,000 H100s. Then with GPT-4, it used even more K100s. At that time, we would say, "Wow!"

Then we entered the "era of 100,000 GPUs." We even wrote some analyses about 100,000-card clusters. But now, there are already ten such 100,000-card clusters in the world

So we started to feel, "Okay, this is no longer surprising." — Now, 100,000 GPUs actually represent over 100 megawatts of power demand. And now in our internal Slack group and other channels, it's already, "Oh, we've found another 200 megawatt data center." — Then someone will send a "yawning" emoji.

I would say, "Bro, are you serious?"

Now we only feel "oh, this is somewhat interesting" when we reach "gigawatt scale." We are now in the "gigawatt era."

Of course, maybe not long from now, we will find gigawatts boring too. But this exponential development is really crazy. Capital expenditure is the same — for example, the billion-dollar training projects that OpenAI once did were already insane. But now we are talking about "hundred billion-dollar training."

You see, our thinking is now logarithmic. But, yes — we only feel truly shocked when doing things like Musk.

Elon’s project in Memphis, Tennessee is an example. The first time it really shocked people: 100,000 GPUs completed in six months.

He bought the factory in February 2024 and trained the model within six months. He implemented liquid cooling — this is the first large-scale liquid-cooled data center for AI; used various crazy new methods, such as placing diesel generators outdoors, using CAT turbines, and making various emergency deployments to pull power; even used the natural gas pipeline next to the factory.

So he just went for it, aiming for 100,000 GPUs, which is a scale of 200-300 megawatts. And now he is pushing forward with gigawatt-level projects at the same speed.

You would think the second time would be more impressive than the first, right? But we are a bit numb... it’s like a kid eating too much candy, eventually not even liking apples anymore (laughs).

So yes, Musk's gigawatt data center is indeed impressive.

But his factory in Memphis has caused quite a bit of protest. People say, "You are polluting the air."

But have you looked at the area around Memphis? There is a gigawatt gas power plant that basically powers the entire area; there’s also the sewage treatment plant for the whole city of Memphis; and open-pit mining areas — all kinds of dirty and chaotic infrastructure are necessary to keep the country running.

But when it comes to Musk wanting to set up a few hundred megawatts of power projects, everyone protests.

So he not only has to deal with technical issues but also politics, public relations, and various municipal disputes. Even the NAACP (National Association for the Advancement of Colored People) has come to protest him.

Some local governments also have a lot of objections to him, so he can't expand at the original site.

But he still wants to build the data centers as close as possible because he hopes these centers can have super-fast bandwidth connections Moreover, the infrastructure there is quite mature, so he bought another distribution center — still in Memphis. One significant advantage of Memphis is that it is very close to Mississippi.

The new land he purchased is only about a mile from the Mississippi border, and he has also bought a power plant in Mississippi.

Because the regulatory policies in Mississippi are completely different from those in Tennessee — it's much easier to get things done there.

So if you ask, "Who is best at mobilizing resources and quickly advancing construction?" — perhaps, Musk really is the strongest.

His model may not necessarily be the best, at least not right now. You could say Grok-4 was once ahead for a short time. But from the perspective of execution, he can build these behemoths at an astonishing speed, which is truly admirable.

Moreover, he approaches this entirely from first principles. Most companies would give up and look for new land as soon as they encounter, "Oh, there's no power here, we can't build."

But he doesn't. He would say, "Then I'll just get a piece of land in the neighboring state."

What I like most is: he is currently in Mississippi, and Arkansas is right next door. Just in case Mississippi doesn't cooperate one day...

Will all future data centers need to be built at the junction of several states? The "four-state junction" might be the best regulatory solution.

"Is there anywhere in the U.S. where five states meet?" "I only know of places where four states meet." "Maybe one day Reddit will say: I want to go there to buy land and build a data center."

Some Advice for Startups

S: Well, I think, on the topic of new hardware, you wrote an analysis of the TCO (Total Cost of Ownership) of the GB200. Uh, I have a question that represents some of our portfolio companies, and it seems you have already been helping them. Uh, but I find one conclusion very interesting, which is that the TCO of the GB200 is about 1.6 to 6 times that of the H100. Uh, so obviously, you know, that becomes the benchmark for switching to new hardware, at least needing a performance improvement-to-cost ratio advantage. Maybe you could talk about what you see, uh, from a performance perspective, and how you would advise portfolio companies — those smaller companies than xAI — on what to consider when looking at new hardware, of course keeping in mind capacity constraints.

D: Yes, I mean, uh, this is indeed a challenge, right? As each generation of GPUs becomes much faster, you feel like you want that new generation. Uh, you know, on certain metrics, you could say the GB200 is three times faster than the previous generation, or twice as fast. On other metrics, you could say it's much faster than the previous generation, right? Uh, so if you do pre-training compared to inference, right? You could run everything for a while, right? Yes. If you can run it for a while or just do inference and leverage the massive NVLink or NVL72, you know, uh, in some ways you could say the GB200 is only twice as fast as the H100 In this case, a TCO of 1.6 times is worth it, right? It's worth switching to the next generation, but the marginal gains are relatively small.

It is more marginal. Uh, it's not a big explosion (not a huge change). Then there's another situation, if you're running DeepSeek inference, the performance difference per GPU is six times, seven times or more, and it will continue to optimize, uh, you know, optimizing for DeepSeek inference. So the question is, uh, then it's "I'm only paying 60% more, but I get 6 times the performance," right? That's a 4 times or 3 times improvement in performance per dollar, absolutely correct. If you're doing inference for DeepSeek, that might also include RL (reinforcement learning), right?

So the question arises, another question is, the GPU is new. You know, there's B200, there's GB2000, and there's B2000. From a hardware perspective, B200 is relatively simple. It's just a box with 8 GPUs. So the performance improvement during inference isn't that much, but you have stability, right? It's an 8 GPU box. It won't be that unreliable. The GV200 series still has some reliability challenges. These issues are being addressed and getting better day by day. Uh, but it's still a challenge.

You know, when you have a GB2 or an H100 8-GPU box or an H200 8 GPU box, if one of the GPUs fails, you have to take the entire server offline. You have to go fix it, right? So if your cloud system is good enough, they will replace it, right?

But if it's GB200, then you have 72 GPUs, and if one of them fails, how do you handle that? Do you take down all 72 and replace them? This is a blast radius issue, right? No. The failure rate of GPUs is at most about the same as before, possibly worse, because each generation gets hotter, faster, and so on. So even if you assume the failure rate is exactly the same, going from "one fails 8" to "one fails 72" becomes a very big problem.

So what many people are doing now is putting a high priority workload on 64 GPUs, and then using the remaining 8 GPUs to run low priority workloads. This means "Okay, there's this infrastructure challenge": I must have high priority tasks and also low priority tasks. When a high priority task fails, instead of taking the entire rack offline, you move some GPUs from the low priority to the high priority task, and then let the faulty GPU stay there, waiting to be replaced when the rack is fixed later. This kind of model And just like that, it makes the claim of "three or two times the performance improvement in pre-training" less credible because of high downtime, or not all GPUs are always in use, or you don't have the infrastructure to handle priority task switching. It's not impossible; labs can do it, right? But if you're a cloud service provider, it's very difficult. Because you might also have to rent spare idle GPUs, possibly using spot instances or other methods for redundancy.

No, no, no, because it's a coherent domain connected by NVLink. You don't want anyone touching those links (NVLink). So the end customer doesn't need to leave them idle as spares. That's worse. No, the end customer usually says, "I want these," and you have to, well, you know, consider the service level agreements (SLA) and pricing, etc., right?

So generally, when you're using the cloud, there is an SLA, right? Uh, it says: "My uptime will be 99%," you know, things like that. Right? Or during this period. Uh, for GB200, it's 99% for 64 GPUs instead of 72; then it's 95% for 72. This varies with every cloud service provider; each cloud's SLA is different. But they all adjust for this because they say, "Look, this hardware is finicky," do you still want it? You know, we'll compensate you; there will always be 64 working out of these 64, right? Not always 72.

So there's this "finicky nature," and the end customer has to be able to handle this unreliability. You need to have the capability to manage it.

And the end customer might just continue using V100 or older GPUs until they're ready or the infrastructure is set up. The performance improvement isn't that significant. But the reason you want this 72 GPU domain is that you can gain those performance benefits, right?

But you have to be smart enough to do that, which is very challenging for small companies.

I completely agree. So Nvidia just announced those Reuben prefill cards, like CtX, CX, CPX, CPX. What do you think about that? Will it cannibalize previous products?

Man, seriously, I don't know what I had for lunch yesterday, but I know every chip model number, hahaha. Like, your dreams are shattered; we got messed up. Living in a dream.

No, no, no, no, no. Uh, you know, why would you pre-announce a product that's five times faster in certain use cases? Is it that obvious? Oh, I think, I think historically, AI chips were all just "AI chips," right? Uh, then we start with many people saying this is a training chip, this is an inference chip. In fact, the demand for training and inference switches very quickly, so now they require a lot of overlap. In reality, it is still one chip. Uh, there are indeed differences at the workload level. But even in training, inference now dominates because reinforcement learning (RL) is mostly about generating things in an environment and trying to achieve some kind of reward, right? So it is still inference. Training is increasingly being dominated by inference now.

Inference has two main operations, right? Uh, there is the prefill cache (KB cache) part, right, doing attention across all documents, among all tokens, regardless of what kind of attention you use. Then there is decoding, automatically generating output one token at a time. Uh, these are very, very different workloads.

So the initial idea on the infrastructure (ML system technology) side was: "Okay, I will make the batch size for each forward pass very large, maybe I will let 32 users run concurrently." So you know, now I still have more than 900 remaining, right? That remaining part does prefill, and if a request comes in, it splits them into chunks for prefill. There is a method called trunk prefill, where you prefill part of the request first. This way, the GPU utilization is very high.

But this will affect the decode workload (the part that generates tokens one by one). The TPS (tokens per second) of these workers slows down. Tokens per second are very important for user experience and related to everything else. So then the idea emerged: "Well, since the differences between these two workloads are so large and indeed different, why not completely separate them?" You prefill with one set of GPUs and decode with another set of GPUs, right? Almost every lab, almost every company is doing this. OpenAI, Anthropic, Google, almost everyone is doing this: disaggregated prefill/decode.

Why is this beneficial? Because you can autoscale resources. Right? For example, if suddenly a long context is input, you allocate more resources to prefill. Oh, suddenly my input is short and output is long, I allocate more GPUs to decode.

This way, I can ensure that the "time to first token" is fast enough, which is a very important factor for user experience. Otherwise, users will say: "Forget it, I won't use this AI anymore." "The speed of decoding is also very important, but not as important as the first token.

By separating prefill and decode, all of this can be done on the same infrastructure, right? So now the question becomes what the next logical step is. The workload differences are so significant that on the decode side, you have to load all parameters and the KV cache to generate a single token. You might batch several user requests together, but soon you'll run out of memory capacity or memory bandwidth because everyone's KV cache is different. Yes. Well, that's the case for attention calculations between all tokens.

During prefill, I can even serve just one or two users at a time because if they send me a context request of around 64,000, that's a lot of FLOPS, right? Uh, 64,000 context requests. I would use, say, Llama 70B because doing math with 70 billion parameters is relatively simple. Uh, that's about 140 giga-flops per token. 70 multiplied by 64,000, that number is a lot of paraflops. You might be able to run prefill on a GPU in one second, right, depending on the GPU.

That's just a forward pass. So I don't necessarily care how quickly you load all tokens or all parameters into the KV cache; I care about the total amount of FLOPS. This leads us to the way of thinking about CPX.

I give this long explanation because many people may not understand what CPX is. Many of my clients even after we send multiple notes say, 'I still don't quite understand.' I think, 'Well.' Uh, 'Attention is all you need,' you can't expect... I mean, for example, a network expert might say, 'I don't know these details.' You know, attention is all you need, right? Or think about an investor: they might say, 'Why have two chips, why differentiate data centers?' You know, I have to explain everything to them.

Or at Stanford, at least 25% of the students, not just computer science majors, but 25% of all students read these papers, read 'Attention is all you need,' right? These are the main topics of discussion. I think that's amazing. Sorry, I forgot which Middle Eastern country it was. There, AI education starts around the age of eight, and in high school, students must read 'Attention is all you need.' Wow. I've heard people say their worship (Santa? Or satna?) should read "attention is all you need." I'm not sure. Anyway, in terms of education, top-down mandatory policies may be effective or ineffective. You know, maybe someone is teaching their kids at home. I don't know. I went to a public school. But back to your reader's question.

Yes, uh, on the topic of hardware cycles, I think I should actually explain what CPX is. CPX is a very "compute optimized" chip, right? You know, the kind used for prefill and then decode.

And decode usually refers to those conventional chips with HBM (High Bandwidth Memory). HBM accounts for more than half of the cost of GPUs. If you strip out HBM, uh, then you can offer customers a much lower-cost chip. Or, if Nvidia maintains the same profit margin, then the cost of this prefill chip is much lower, making the entire process cheaper and more efficient. Longer context can also be adopted.

GPU Market Status

S: Well, yes. I really like that we're diving into these details because I have a more "high-level" (10,000-foot view) question to ask you, uh, it's like this—I haven't been keeping up with the semiconductor market like you have. I might have started paying attention from the A100 generation. Uh, I remember in the summer of June 2023, I was helping Gnome at Character (company name) find GPUs, and at that time, the only thing that mattered was the delivery date because capacity was severely constrained. Uh, then the evolution you've seen over the past two years is, for example, 6 to 12 months ago, people would send RFPs (Request for Proposals) to 20 Neoclouds, right? In some ways, the only thing that mattered was price. People were really sending RFPs for GPUs.

D: Oh, so to clarify, my view on how you buy GPUs is that it's like buying cocaine or other drugs. Uh, this is how others have described it to me, not that I'm buying drugs. Someone told me this analogy. I thought at the time, "Wow, this analogy is so fitting." You call a few people, you text a few people, you ask, "Yo, how much do you have? What's the price?" It's like, exactly the same. It's like the operation of buying drugs. Oh, sorry, sorry. I'm not saying I'm buying drugs. I'm just saying... up to today, that's how it is. You send Slack messages to like 30 Neoclouds, including some of the biggest ones, and then you message saying, "Hey, this customer wants this much, this is their required configuration." Then those cloud vendors reply with quotes. I know this person, I know that person Well, I think this description is very accurate. I have sent countless port codes or original posts from Cluster Max because I think that article breaks down the situation very clearly.

S: But maybe I’ll ask you a question before we wrap up. Now that we have Blackwells (Blackwell GPU products) launched, what era are we in? Are we somewhat returning to the "extreme GPU shortage" era of summer 2023? Or have we just entered a new cycle? What do you think about where we are now?

D: That's a very good question. For one of your portfolio companies, we have a case like this: after they encountered difficulties with Amazon, we tried to say, "Okay, we'll find you GPUs." The deals that were previously achievable were gone, but we found other deals, right? It turned out that a lot of the major Neoclouds' hopper capacity was sold out. And their Blackwell capacity won’t be online for a few months. Uh, so it’s a bit challenging. Right? Because the demand for inference has skyrocketed this year. Right? These inference models, um, have seen a significant revenue increase this year. And another situation is that although Blackwell is going online, deployment is not easy, so it takes some time, there’s a learning curve. So it seems that in the past, you could buy Hopper GPUs, build data centers, and get them operational within a few months; but with Blackwell, because it’s a new GPU, there are reliability challenges and growing pains, so it takes longer.

So there’s a gap: how many GPUs are officially on the market while revenue starts to inflect. A lot of capacity has been quickly snatched up. In fact, the price of Hopper hit bottom three or four months ago, even five or six months ago. Yes. Actually, their prices have slightly rebounded now. They are still somewhat difficult to buy, but not extremely easy either.

I don’t think we are fully back to the extreme GPU tightness of 2023-2024, but if you need a lot of GPUs at scale, it is indeed very difficult. If you only need a small number of GPUs, it’s relatively easier. Yes. Wow, this is truly an era. Are we going to wrap up? Dylan, this has been another instant classic. Thank you very much for coming on the podcast. We talked for two hours, brother. What? Did I miss anything? Thank you. We can’t stop. Thank you very much