
This article clarifies the current situation of the AI battle in Silicon Valley

Dylan Patel, the founder of SemiAnalysis, believes that Meta is fully pursuing "superintelligence"; Apple may fall behind in the AI talent competition due to cultural and resource disadvantages; the IP control dispute between OpenAI and Microsoft hides concerns; although NVIDIA is strong, strategic missteps have given AMD opportunities. In addition, he is not very optimistic about AI on the device side, believing that the core AI capabilities will still be in the cloud in the future. White-collar jobs will be impacted by AI. OpenAI and Meta will lead the "superintelligence" race
Recently, Dylan Patel participated in a podcast interview with Matthew Berman, providing an in-depth and insightful analysis of the current "Silicon Valley AI battle."
Dylan Patel is an expert with profound insights in the semiconductor and artificial intelligence fields. He founded SemiAnalysis, an organization that offers in-depth technical analysis and market insights. Patel is known for his unique perspectives on the chip industry, AI model development trends, and the strategic layouts of major tech companies.
Dylan Patel believes that Meta's recent acquisition of Scale AI is not focused on its increasingly "outdated" data labeling business, but rather on Alexandr Wang and his core team. Wang's addition marks a significant shift in Zuckerberg's AI strategy, moving from the previous stance of "AI is good, but AGI won't be achieved soon" to a full pursuit of "superintelligence," as he realizes that Meta has fallen behind in this field and needs to "catch up."
Apple faces disadvantages in attracting top AI researchers because it cannot offer an attractive corporate culture, high salaries, and ample computing resources like Anthropic or Meta.
For OpenAI, Patel sees a greater concern in Microsoft's control over IP. Microsoft theoretically could own all the IP just before AGI is realized, creating significant uncertainty for OpenAI's researchers.
NVIDIA has built a strong moat with its excellent hardware interconnect and mature software ecosystem. However, NVIDIA also faces challenges. Its recent acquisition of Lepton and the launch of DGX Cloud directly compete with cloud service providers, leading to dissatisfaction among some cloud service providers, causing them to start turning to AMD. Dylan Patel views this as a "major mistake" for NVIDIA.
In terms of edge AI, Dylan Patel expressed a pessimistic view. He believes that although edge AI has advantages in security and low latency, consumers are more sensitive to price and tend to prefer free cloud AI services. He predicts that the application of edge AI will mainly be limited to low-value, lightweight tasks, such as image recognition or hand tracking on wearable devices, while complex and valuable AI functions will still rely on the cloud. Apple itself is also building large data centers, indicating that it recognizes the cloud as a key direction for AI development.
Patel believes that although many companies claim to be doing different things, the underlying technologies and methods are largely the same, namely pre-training large Transformer models and conducting reinforcement learning. He mentioned the importance of "rewriting the human knowledge corpus," as there is a lot of low-quality information in existing data. He believes Grok has an advantage in handling real-time events and current affairs information.
Dylan Patel believes that the ultimate goal of AI is to reduce human working hours, although it may lead to a few people working excessively while the majority work less AI first impacts white-collar creative jobs (such as graphic designers), rather than manual labor, which contradicts common perceptions. He predicts that in the future, AI will take on more long-term, complex tasks, and may eventually operate completely independently of human oversight. Regarding the timeline, he believes that achieving 20% automation of jobs within ten years is unlikely, more likely to occur at the end of the decade or the beginning of the next decade.
Who will win the superintelligence race? Patel believes that OpenAI is in the lead, with Meta closely following. He is confident that Meta has the ability to attract enough top talent to stand out in the superintelligence race.
“If superintelligence is everything, then $100 million, or even $1 billion, is just a drop in the bucket compared to Meta's current market value and the overall potential market for artificial intelligence.”
“Many who go to Meta are clearly motivated by money, but many others leave (their original positions) because they can now control the AI development path of a trillion-dollar company. They can speak directly with Zuckerberg and persuade someone who has complete voting rights over the entire company.”
“OpenAI's valuation will continue to soar because what they are building has no profit plan in the short term. … So throughout this process, they will be continuously losing money, needing to raise funds, and must be able to persuade every investor globally. Moreover, these terms are neither glamorous, nor concise, nor easy to understand.”
“Is there a problem with GPT 4.5 Orion? This is what they internally refer to as what they hope to become GPT 5… Overall, it’s not that practical, too slow, and too costly compared to other models.”
“This is also another area where I sometimes use Grok: current events. You can ask Grok questions, and it can tell you what’s happening more accurately than Google Search, or even queries from Gemini or OpenAI, because it can access all this information (data from X).”
“As for Apple, they have always had issues attracting AI researchers, who like to boast and publish their research findings. Apple has always been a mysterious company.”
“Even if you have excellent talent, it is still challenging to produce good results due to organizational issues, as the right people may not be in the right positions, and decision-makers may choose the wrong people, letting them engage in politics and incorporate their ideas and research paths into the model, which may not necessarily be good ideas.”
“Overall, I am pessimistic about device-side AI and do not see it positively. Security is great, but I understand human psychology: free is better than paid, and a free model with ads is more attractive than pure security. In fact, not many people really care about security issues.”
"A major challenge for AI on the device side is hardware limitations. The inference speed of the model depends on the memory bandwidth of the chip. If I want to increase the memory bandwidth of the chip, the hardware cost may increase by $50, which will ultimately be passed on to customers, resulting in an additional $100 cost for the iPhone."
"The scenarios where AI truly plays a role on the device side will be in wearable devices, such as headphones or smart glasses. What you do locally are just small tasks, like image recognition and hand tracking, but the actual inference and thinking are done in the cloud."
"AMD is indeed working hard, but their hardware is lagging in some aspects, especially compared to Blackwell. The real challenge they face is software; the developer experience is not great. NVIDIA can connect GPUs through the NVLink network hardware on the chip. The way NVIDIA builds servers allows 72 GPUs to work very closely together, while AMD can currently only manage 8. This is crucial for inference and training."
"NVIDIA recently made a significant mistake: acquiring Lepton. Now NVIDIA has acquired this company that develops software layers and is working on a product called DGX Cloud. This means that if any cloud service provider has idle GPUs, they can hand them over to NVIDIA as bare metal, and NVIDIA will deploy Lepton's software on them and rent them out to users. This has infuriated cloud service providers because NVIDIA is directly competing with them."
"I don't think 20% of jobs will be automated within this decade. I feel it might take until the end of this decade or the beginning of the next to achieve 20% job automation."
"Artificial intelligence should reduce our working hours. In the future, there may be a situation where people like me (and possibly you) are overworked, while the average person's working hours are much less."
"Who will win the superintelligence race? OpenAI. They are always the first to achieve every major breakthrough, even in inference."
"I believe Meta will attract enough talented individuals to become truly competitive."
The following is the full interview, translated by AI:
Meta Llama 4 and the Delayed "Behemoth" Project
Matthew Berman:
Dylan, thank you so much for joining me today. I'm really excited to talk with you. I've seen you give quite a few talks and interviews. We have a lot to discuss. I want to start with Meta. Let's begin with Llama 4. It's been a while since that product was released, and there was a lot of anticipation in the AI field at that time. It was decent, but not great. It didn't change the world. Then they delayed the Behemoth project. What do you think is happening there?
Dylan Patel:
It's interesting. There are probably three different models, and they all have significant differences. The Behemoth project has been delayed. I actually think they may never release it. There are many issues with it, the way they trained it, and some of the decisions they made did not yield the expected results There are also Maverick and Scott. In fact, one of the models is quite good. Quite good. It wasn't the best when it was released, but it was comparable to the best Chinese models at the time of release. But then Alibaba launched a new model, and Deep Sea also launched a new model. So the situation got worse. Objectively speaking, it was really bad.
I know for sure that they trained it to cope with deep search trying to use more elements of the deep search architecture, but they didn't do it properly. It was really just a rushed job, and it messed up because they overemphasized the sparsity of Moe. But interestingly, if you really look at the model, it often doesn't even route tokens to certain experts. It's basically like a waste of training, where the router can route to any expert it wants between each layer, it learns which expert to route to, and each expert also learns.
It's like having independent things. This is really not something people can observe. But what you can see are the tokens. They will route which experts to where, or when they pass through the model, it seems like some of them are not routed at all. It's as if you have a bunch of idle empty experts. Clearly, there are issues with the training.
Matthew Berman:
Is this considered a matter of professional skills internally? I mean, they certainly have some of the best talents in the world, and we will talk about their recent hiring initiatives next. But why have they never really managed to do it?
Dylan Patel:
I think it's a combination and convergence of various things. They have a lot of talent and a lot of computational resources. But the organization of personnel has always been the most challenging thing. Which ideas are actually the best? Who is the technical leader picking the best ideas? It's like if you have a group of excellent researchers, that's great. But if you arrange product managers above them without a technical leader to evaluate how to choose, then there will be a lot of problems.
At OpenAI, Sam is a great leader who can access all the resources. But the technical leader is Greg Brockman. And Greg Brockman is making a lot of decisions, along with many others, like Mark Chen and other similar technical leaders, who are actually deciding which direction we should take from a technical perspective because researchers will conduct their research and think their research is the best. Who is evaluating everyone's research and then deciding which ideas are great and which are bad? Let's not use that. This is really very difficult.
When researchers do not have a technically skilled leader who can make choices and really make the right choices, they ultimately get into trouble. We do have all the right ideas. But part of AI research is that you will also have various wrong ideas, you learn from those wrong ideas, and then generate the right ideas and make the right choices. Now, if your choices are very bad, and you actually choose some wrong ideas, then you go down some branch of research, what happens? Just like you chose this bad idea, this is what we have to do. Let's move on. And now you might think, there’s more research derived from this bad idea. Because we won't go back to undo the decisions we've made. Everyone is saying, we made that decision. Well, let's see what we can research from here. So you ultimately face a situation where excellent researchers might be wasting their time. I'm on the wrong path. Researchers often mention something called taste.
It's interesting. You think of these as akin to those who aspire to be international math Olympiad competitors, which is their path to fame. But when they are teenagers, they go to OpenAI or other companies, or Meta or elsewhere at 19.
But in reality, there are many taste issues involved. To some extent, judging what is worth researching and what is not is an art form. It’s an art form of choosing the best option because you lay out all these ideas on this scale, and then suddenly you think, now we’re going to run those experiments with 100 GPUs. Awesome. Now let’s run it once with 100,000 GPUs.
With this idea, it’s as if things don’t always translate perfectly. There are many personal taste and intuition components involved. It’s not that they don’t have excellent researchers. It’s just hard to say whose taste is right. It’s like you don’t care about the critics’ reviews; you care about Rotten Tomatoes, maybe the audience score. And it’s like, which critic’s opinion are you really listening to? Even if you have excellent talent, due to organizational issues, it’s very challenging to actually produce good results because the right people are not in the right positions, decision-makers might choose the wrong people, letting them get political, incorporating their ideas and research paths into the model, and those ideas may not necessarily be good ones.
ScaleAI Acquisition and Meta's "Superintelligence" Ambition
Matthew Berman:
Let’s discuss who is making the decisions. There were many news reports last week that Zuckerberg made a $100 million offer, which Sam Altman also confirmed. Meta's acquisition of Scale AI seems to be aimed at Alexandr Wang and his team, who are in founder mode. So what does the acquisition of Scale AI actually bring to Meta?
Dylan Patel:
I think, to some extent, the AI data labeling business is a bit "outdated" now.
Matthew Berman:
As a service, because companies are canceling orders.
Dylan Patel: Yes, Google is pulling out. I heard that Scale AI had about $250 million in collaboration with them this year, but they are withdrawing. Clearly, Google has invested a lot of money, and the project is already in a difficult position, but these investments will be significantly reduced. It is said that OpenAI has also cut off external Slack connections, so there is no longer any contact between Scale AI and OpenAI.
Matthew Berman:
A complete break between companies.
Dylan Patel:
Yes, so companies like OpenAI do not want Meta to know how they handle data because the uniqueness of the model lies in what you want to do with custom data. Meta's acquisition of Scale AI is not for Scale AI itself, but to get Alexander and a few of his core colleagues. Scale AI also has some other outstanding talents, and Meta will bring them on board as well.
Dylan Patel:
The current question is whether the data brought by Scale AI is of high quality. While it is beneficial to understand the various data labeling paths that other companies are pursuing, more importantly, Meta wants to find someone to lead this Superintelligence work. Alexander Wang and I are about the same age, probably 28 or 29. He has achieved remarkable success in many areas. People may not like him, but he is clearly very successful, especially when he convinced Mark Zuckerberg—a very rational and smart person—to buy his company. This company has nearly $1 billion in revenue, and he said, "Let's pursue superintelligence."
This is a huge shift for Zuckerberg. If you look at Zuckerberg's interview a few months ago, he was not chasing superintelligence; he was just promoting that AI is good and great, but he believed that artificial general intelligence (AGI) would not be realized anytime soon. So this is a significant strategic shift because he is now basically saying, "Superintelligence is what matters most, and I believe we are moving in that direction. Now, what can I do to catch up? Because I am behind."
Matthew Berman:
It seems that the narrative of all these big companies has now shifted to "superintelligence," even though just a month ago it was "artificial general intelligence" (AGI). Why this shift?
Dylan Patel:
The term AGI does not have a clear definition.
Matthew Berman:
Yes, it is ambiguous.
Dylan Patel:
You could directly ask a human researcher, "What does AGI mean?" They might really think it just means an automated software developer, but that is not artificial general intelligence. Many researchers in the ecosystem think so. Ilya Sutskever was the first to gain insight into all of this, and then he founded his own company, Safe Superintelligence (SSI) I believe this has opened the trend of brand reshaping in the industry. Months have passed, and almost nine months to a year later, everyone is saying, "Oh, superintelligence is a reality." So this is another research direction first proposed by Ilya, like pre-training expansion, reasoning networks, etc. Even if he is not the originator, he has put in a lot of effort for it. This kind of brand reshaping may indicate that he understands marketing as well.
Matthew Berman:
There are rumors that Zuckerberg tried to acquire SSI but was rejected by Ilya. I also want to ask you about Daniel Gross and Nat Friedman. These rumors may now be confirmed, as Zuckerberg seems to be trying to hire them. What can these two bring?
Dylan Patel:
Zuckerberg attempted to acquire SSI, and he also tried to acquire Thinking Machines and Perplexity. These have been reported in some media. Specifically, the rumor is that Mark tried to acquire SSI, but Ilya clearly rejected it because he is committed to the core mission of achieving superintelligence and is not worried about products, and he may not care much about money. He is primarily focused on building it and is a true believer in every aspect. He likely has enough voting power and control to say "no." If the rumors about Daniel Gross are true, then it is likely that he wanted to facilitate this acquisition. He would say that he is an awesome founder.
The other founder, Nat, does not come from an AI research background, but he and Daniel have their own venture capital fund and co-founded SSI together. He may have wanted to pursue the acquisition but ultimately did not succeed. I speculate that if he (Daniel) really wants to leave, it is likely due to disagreements and rifts regarding the acquisition of SSI, so he wants to move on.
Overall, when you observe many very successful people, you will find that the key to success is not money, but more about power. Many who go to Meta are obviously there for money, but many others leave their original positions because they can now control the AI development path of a trillion-dollar company. They can speak directly with Zuckerberg and persuade someone who has complete voting rights over the entire company.
There is tremendous energy in this. They can implement any AI technology they want among billions of users, whether it is infrastructure, research, or products. For people like Alexander Wang, Nat Friedman, or Daniel Gross, who are more product-focused, this makes a lot of sense. Nat built GitHub Copilot; he is a product person rather than an AI researcher, although he knows a lot about AI research. Similarly, Alexander is clearly very knowledgeable about research, but his super skill lies in dealing with people, persuading others, and coordinating organizations, and he may not be as outstanding in research itself. At Meta, they have all the resources and power to do many things Sam Altman also mentioned that Meta has been offering hundreds of millions of dollars to its top researchers, but none of the top researchers have left. I want to ask, is it a successful strategy to solve problems solely by throwing money at them and hiring the best talent? It feels like cultural elements might be lacking. At OpenAI, there are many true believers who work for the mission. Is simply investing money and attracting the best researchers enough to create that kind of cultural atmosphere?
It depends on how you think. If you believe that superintelligence is the only thing that matters, then you must pursue it; otherwise, you are a failure. Mark Zuckerberg certainly does not want to be a failure, and he believes he can build superintelligence. So the question becomes, what should you do?
The answer is to try to attract the best teams, like Thinking Machines, which has all the outstanding researchers and infrastructure personnel from former OpenAI, Character.AI, Google DeepMind, Meta, and so on. SSI is the same; it is Ilia and the people he recruited. Meta is trying to recruit people from these companies and also trying to acquire these companies. When that path does not work, you collaborate with people like Alexander, who have extensive networks and can help you build a team, and then immediately start building the team.
14:28 How is this different from acquiring a company like SSI, which has far fewer than 100 employees? I think SSI has even fewer than 50 employees. Spending $30 billion to acquire it is like saying, "We spent hundreds of millions on each researcher, and over $10 billion on Ilia himself." This is the same thing you are doing now (hiring people individually).
As for Sam's statement that "no top researchers have left," I think that is not accurate. Initially, there were definitely top researchers who left. And you mentioned $100 million; in fact, I have heard that one person at OpenAI is worth over $1 billion. Regardless, these amounts are huge, but it is the same as directly acquiring one of those companies. Companies like SSI or Thinking Machines do not have products; you acquire them for talent.
If superintelligence is everything, then $100 million, or even $1 billion, is just a drop in the bucket compared to Meta's current market value and the overall potential market for artificial intelligence.
Microsoft and OpenAI: From Honeymoon to "Therapy"
Matthew Berman:
I want to talk a little about the relationship between Microsoft and OpenAI. It seems we have long passed the honeymoon phase, and their relationship is indeed in a state of turbulence.
Dylan Patel:
Now it has become a kind of therapy.
Matthew Berman:
Yes, absolutely.
Dylan Patel:
Tell me your feelings, Sam and Satya.
Matthew Berman: This is psychotherapy. There are two people who have a relationship, and this relationship seems to be a bit fractured. OpenAI's ambitions seem boundless. Is Microsoft now considering adjusting the deal? What about OpenAI? Microsoft doesn't seem to have a reason to do so, but how do you think the dynamics of this relationship will develop next?
Dylan Patel:
OpenAI wouldn't have achieved what it has today without Microsoft. Microsoft signed an agreement that granted it tremendous power. It's a strange deal because initially, they wanted to be a nonprofit organization and were concerned about AGI (Artificial General Intelligence), but at the same time, they had to give up a lot to get the funding.
Microsoft doesn't want to get involved in antitrust issues, so they structured this deal in a very peculiar way. There are revenue shares, profit guarantees, and various other elements, but nowhere does it state: you own X% of the company. I can't recall their exact share structure, but it's roughly a 20% revenue share, with a 49% or 51% profit share, until a certain cap is reached. Then, Microsoft owns all of OpenAI's IP (intellectual property) rights until AGI is achieved.
All of this is very vague. The profit cap might be around 10 times. Again, I'm just speculating; I haven't followed this closely for a while. But it's like Microsoft invested about $10 billion, and OpenAI has a 10 times profit cap, which means if Microsoft can get $100 billion in profit from OpenAI, what motivation do they have to renegotiate now? Before that, OpenAI has to give them all profits or half of the profits. They get a 20% revenue share and can use all of OpenAI's IP before AGI is achieved.
But what is the definition of AGI? Theoretically, OpenAI's board can decide when AGI is reached. But if that really happens, Microsoft would sue them into bankruptcy, and Microsoft's lawyers are more numerous than God's. So this is a crazy deal.
For OpenAI, there are indeed some concerning aspects. One of the main points they have removed is that Microsoft was very worried about antitrust issues, meaning OpenAI had to exclusively use Microsoft's computing resources. They dropped this clause last year and then announced the "Stargate" collaboration this year. This means OpenAI will flow to Oracle, SoftBank, CoreWeave, and the Middle East to build their "Stargate" cluster, which is their next-generation data center. Of course, they still get a lot of resources from Microsoft, but they also gain substantial resources from "Stargate," mainly from Oracle, but other companies as well.
Previously, OpenAI couldn't do this without directly approaching Microsoft. Initially, they wanted to work with CoreWeave, but then Microsoft intervened in the relationship, saying, "No, you can only use ours." So many GPUs were rented from CoreWeave to Microsoft and then rented to OpenAI. But this exclusive collaboration has ended, and now CoreWeave has signed a large contract with OpenAI, and Oracle has also signed a large contract with OpenAI Matthew Berman:
What did they get in return for Microsoft giving up the exclusive license in this deal? Are there reports on what they received for it? It's usually not as simple as saying, "Well, we gave it up."
Dylan Patel:
Reportedly, they gave up exclusivity in exchange for only a right of first refusal. This means that whenever OpenAI goes to negotiate for computing resources, Microsoft has the right to provide the same computing resources at the same price and timeframe.
Matthew Berman:
It's to reduce antitrust risk.
Dylan Patel:
Yes, antitrust is one of the biggest considerations. From OpenAI's perspective, they are just frustrated that Microsoft's speed is much slower than they need. They can't get all the computing resources and data center capacity they require. CoreWeave and Oracle are progressing much faster, but even so, it's still not fast enough. So, OpenAI has turned to others as well.
But the real challenge now is that Microsoft owns OpenAI's IP, and they have rights to everything. They can dispose of it as they wish. Whether Microsoft acts friendly and doesn't exploit it for anything, or is somewhat incompetent and unable to fully utilize it and just browses through it, regardless of the reason, Microsoft hasn't done much despite having the capability. But the possibilities are endless.
Another thing is that you own all the IP until superintelligence is achieved. This means that on the day before superintelligence is realized, you own all the IP, and then it gets cut off, but until that moment, you own all the IP. So this might be a day's worth of work, maybe 10 days, rather than just 1 day. Or, you achieve superintelligence, but it takes some time to review and reach a consensus to confirm that you have achieved it. But just like you declare that the superintelligent model is here and make it public on that date, Microsoft can access it.
So for OpenAI's developers, this is the real significant risk. Profit-sharing and similar matters are very complex and difficult, and most people don't care much about these when investing in OpenAI. Getting every investor in the world to recognize your crazy structure—namely, a non-profit profit model and all that kind of stuff—is extremely challenging. Microsoft has long-held rights to all your profits and all the IP. So theoretically, if they decide to poach some of your best researchers and implement everything themselves, you could become worthless.
Such things scare investors, and Sam and OpenAI believe this will become the most capital-intensive startup in human history. Valuations will continue to soar because what they are building has no profit plans in the short term. They have been around for a long time, with annual revenues of about $10 billion, but they won't be profitable in the next five years. Predictions show that before they become profitable, they expect their revenues to far exceed hundreds of billions, possibly reaching trillions. So throughout this process, they will be continuously losing money, needing to raise funds, and must be able to convince every investor globally Moreover, these terms are neither honorable, concise, nor easy to understand.
Why was the GPT-4.5 Orion project abandoned?
Matthew Berman:
Well, you mentioned computing power a bit, especially that Azure can connect to Core Weave and other places. I want to specifically talk about 4.5, GPT 4.5. I believe it was abandoned last week. It's a large model very similar to yours.
Dylan Patel:
Is that really the case?
Matthew Berman:
Isn't it?
Dylan Patel:
I don't know. I thought it was still usable in chat. I'm just curious.
Matthew Berman:
Maybe they just announced the abandonment, but it was only a matter of time.
Dylan Patel:
No, it's still there. But you're right, they did skip it. Its usage is very low, which makes sense.
Matthew Berman:
Is it because the model is too large, or the operating costs are too high? What went wrong with GPT 4.5 Orion?
Dylan Patel:
That's what they internally referred to as something they hoped would become GPT 5; they placed that bet at the beginning of 2024. They started training it at the beginning of 2024. It was a full-scale bet, a fully pre-trained model. We just wanted to gather all the data, build an absurdly large model, and then train it to be much smarter than versions 4.0 and 4.1.
To be clear, I previously mentioned it was the first model that could make me laugh because it was genuinely funny, but overall, it wasn't that practical, too slow, and too costly compared to other models. It's like O3 only performed better in pre-training expansion, but the data couldn't scale. So they failed to acquire a large amount of data. If the data didn't scale so quickly, they ended up with a very large model trained on all these computational resources.
But you face a problem known as over-parameterization. Generally speaking, in machine learning, if you build a neural network and feed it some data, it often memorizes first and then generalizes. It will know that if I say "the quick brown fox jumps over," the next word is always "the lazy," until you train it with much more data than it learned. What does the quick brown fox actually mean, and what does the lazy dog mean? It won't understand, and it won't actually build a world model. It lacks generality. To some extent, GPT 4.5 Orion is so large, with so many parameters, that it indeed memorized a lot. In fact, when it initially started training, I know people, including myself, were very excited, saying, "Oh my god, it's crushing all the benchmark tests, and we just got started." Matthew Berman:
Training. Because some checkpoints are really great, initially.
Dylan Patel:
Yes, but that's because it just memorized a lot of content and then stopped progressing. It's as if it was just memorized for a long time without generalization ability. It eventually did achieve generalization because it was a very complex run.
In fact, there was a bug in their system that had existed for several months during training. Training usually lasts for several months or shorter, often even less. There was a bug in their training code that had been around for months; it was a very small bug but affected the training. It's interesting, because when they finally discovered the issue, it turned out to be a bug in Pytorch, which OpenAI found and fixed, then submitted a patch. On Github, about 20 people responded to the bug fix with emojis expressing their views. Another thing is that they had to frequently restart training from checkpoints. It's so large and complex that many things can go wrong.
Therefore, from an infrastructure perspective, just integrating so many resources, bringing them together, and stabilizing the training is very difficult. But from another angle, even if the infrastructure, code, and everything else were perfect, you would still face data-related issues. Everyone is mentioning the 2022 paper "Chinchilla"; I think it was in 2022 when Google DeepMind released a paper titled "Chinchilla." It basically expressed the optimal ratio of parameters to tokens for a model. And this only applies to dense models with the exact architecture of the Chinchilla model.
But it feels like if I have X floating-point operations, I should have this many parameters and this many tokens. This is a scaling law. Obviously, when you increase the model size and apply more floating-point operations, the model's performance improves. But how much data should be added? How many parameters should be added now? Obviously, over time, people's architectures will change, and the precise observations of Chinchilla are not accurate.
Generally speaking, you want about 20 tokens for each parameter of the training data, relative to the parameters in the model. Generally speaking, there is actually a curve, and so on. It’s more complex in this regard, but that kind of observation is not entirely the same. However, when you increase computational power, you want to increase more data and parameters in a certain proportion or along a certain curve; ideally, there is basically a formula, and they did not follow this formula. They had to use many more parameters than tokens.
But all of this was early in 2024 when they started training, all these experimental regulations, they eventually did it. I don't remember when they released 4.5. It was last year. But when they finally released the model, it was several months after completing the start of training, finishing training, and pre-training, and then they tried to perform reinforcement learning (RL) and a series of operations. But in the meantime, different teams at OpenAI discovered some magical things, namely the reasoning ability, which is that "strawberry." Matthew Berman:
It's like, as they have invested all these resources and are in the process of training this large-scale model, they realize that due to the reasoning capability, we can achieve higher efficiency and better quality from a single model at a much lower cost.
Dylan Patel:
If you really want to simplify reasoning to first principles, then you provide the model with more data. Where do you get this data from? How do you generate data when creating these verifiable domains, where the model generates data, and you discard all the data that does not yield the correct answer? There, it does not verify whether that math problem, that piece of code, or that unit test is correct. So in a sense, looking back, I obviously didn't have that intuition at the time, but in hindsight, that intuition makes sense, which is that version 4.5 failed because it didn't have enough data.
Moreover, from a scaling perspective and an infrastructure level, it's very complex and difficult. There are a lot of problems and challenges there. But they also shouldn't have enough data. Now, it's like the breakthrough achieved by another team is generating more data, and this data is beneficial. Just like a lot of synthetic data, it’s like low-quality data, but like the magic of strawberries, the magic of reasoning lies in the quality of the data. What kind of data are you generating? So from a first principles perspective, it really makes sense. Data is that wall; simply adding more parameters is useless.
Apple's Lagging Performance in AI
Matthew Berman:
I want to talk a bit about Apple; I believe you have your own views on this. Apple is clearly lagging behind. We haven't gotten much information about their public models, leaks, or what they are doing. What do you think is happening at Apple? Do you think they just made a mistake? They are somewhat related to the game. Why aren't they a recruiting company? If you had to guess, what's happening internally?
Dylan Patel:
I think Apple is like a very conservative company. They have acquired some companies in the past, but they have never made any real large-scale acquisitions.
Matthew Berman:
Beats is the biggest one. A headphone company.
Dylan Patel:
Right. But overall, their acquisition scale is very small. And they have indeed acquired a lot of companies. They just acquired really small companies early on. They found that maybe it was a failed startup, or whatever they bought, these startups they acquired had not achieved product-market fit and were not those super hot companies. For Apple, they have always had trouble attracting AI researchers who like to brag and publish and present their research. Apple has always been a mysterious company. They actually changed their policy to allow their AI researchers to publish results. But at the end of the day, it’s still a mysterious company. It still feels like an outdated company Just like Meta can only continuously enhance a group of researchers and talents because they already have a batch of ML talents, right? They have always been leaders in the AI field. They also have this Pytorch team. And they have committed to open-sourcing AI.
Matthew Berman:
It's been a while. They have always been right. Open source.
Dylan Patel:
Besides that, like who has the capability to attract AI talents, moving from DeepMind to OpenAI, OpenAI is like a competitor to DeepMind, and there are many excellent researchers gathered together to form it, then there’s the split organization of Anthropic, followed by the split organization of OpenAI's Thinking Machine, and SSI is a split organization of OpenAI, right? It's like, which companies can actually attract talents who originally had no AI background? It’s as if Google DeepMind is the biggest brand in this field, and they have always attracted the most AI researchers and PhDs. Then there are organizations like OpenAI and Anthropic, which are somewhat related to Thinking Machine and SSI, right? This is all OpenAI's. I think it's difficult to attract talents to come proactively.
Complex and Valuable AI Functions Will Still Rely on the Cloud
Dylan Patel:
Nowadays, Anthropic has such a strong culture that they can really open people's eyes. I'm thinking, how can leaders like Meta do it, but how can Apple attract these top researchers? The ones they recruit won't be the best researchers. So it's challenging for them to stay competitive.
Moreover, they have a bias against NVIDIA and really dislike NVIDIA. This might be for reasonable reasons: NVIDIA once threatened to sue them over certain patents, and the GPUs they sold to them eventually had failures, an incident known as "Bumpgate." It's a very interesting story.
The issue relates to a generation of NVIDIA GPUs, and I might not remember the specific reasons clearly, after all, it has been many years.
Matthew Berman:
When was that?
Dylan Patel:
Around 2015, or even earlier. At that time, there was a series of NVIDIA GPUs aimed at laptops. The bottom of the chip had solder balls responsible for connecting its input/output (I/O) pins to the motherboard, CPU, power supply, etc. At some point in the supply chain—all companies, including Dell, HP, Apple, Lenovo, blamed NVIDIA, and NVIDIA in turn blamed them, saying it wasn't their fault. I don't want to blame anyone, but the problem was with the quality of the solder balls. When the temperature fluctuated, due to the different thermal expansion coefficients of the chip, solder balls, and PCB (printed circuit board), their expansion and contraction rates were also different. Ultimately, this different expansion rate caused the solder balls connecting the chip and the circuit board to crack. This is what is known as "Bumpgate." In this way, the connection between the chip and the circuit board is broken. I believe Apple wants compensation from NVIDIA, but NVIDIA refused at the time, stating that the situation was complicated. Apple really dislikes NVIDIA, partly because of this incident, and partly due to NVIDIA's threatening actions when trying to enter the mobile chip market (although they ultimately failed). At that time, they attempted to sue all relevant companies over GPU patents in mobile devices. Considering these two matters, Apple has a strong aversion to NVIDIA, and therefore their procurement volume in graphics hardware is not large.
Matthew Berman:
They actually no longer need to procure in large quantities now.
Dylan Patel:
They don't need it in laptops, but it's the same in data centers. If I were a researcher, I would consider factors like cultural fit and salary. Even companies like Meta, which have a lot of computing resources and excellent researchers, still need to offer huge amounts of money to attract talent. Apple won't offer such high salaries, and they don't even have enough computing power. To provide inference services to users, they run models on both Mac chips and data centers simultaneously, which is a strange approach. I (as a researcher) wouldn't want to deal with these issues; I just want to build the best models. This is a challenge for Apple.
Matthew Berman:
Okay, one last question about Apple. They place great importance on on-device AI, which I personally like, for example, its advantages in security and latency. What are your views on on-device AI (i.e., edge AI) versus cloud AI? Will the future trend be somewhere in between the two?
Dylan Patel:
Overall, I am pessimistic about on-device AI and do not have high hopes for it. Security is great, but I understand human psychology: free is better than paid, and a free model with ads is more attractive than just security. In fact, not many people really care about security issues; they verbally express concern, but very few make decisions based on security factors. Of course, I also hope for privacy and security guarantees.
Matthew Berman:
Wait, but you just said you like free, isn't on-device AI also a form of free?
Dylan Patel:
No, for example, Meta provides services for free in the cloud, OpenAI's ChatGPT has a free version, and Google does too.
Matthew Berman:
Yes, and the free version in the cloud will be better than any version running on your device.
Dylan Patel:
Right. A major challenge of on-device AI is hardware limitations. The inference speed of the model depends on the memory bandwidth of the chip. If I want to increase the memory bandwidth of the chip, the hardware cost might increase by $50, which will ultimately be passed on to customers, resulting in an iPhone costing $100 more. With that $100, I could get about 100 million tokens of usage in the cloud, but I personally can't use that much. I would rather save that $100 because Meta provides models for free on WhatsApp and Instagram, and OpenAI offers them for free on ChatGPT Google will also provide it for free. From this perspective, AI on the device side is quite challenging.
In the end, I disagree with the viewpoint about latency. I believe that for certain specific use cases, such as next-word prediction on a keyboard or spell checking with ultra-small models, low latency makes sense. But for us personally, the most valuable AI application scenarios right now, such as searching for restaurants or accessing my Gmail and calendar, all that data is already in the cloud.
There are numerous use cases in the business field, but personally, my data is already in the cloud anyway.
For individuals, the most valuable applications are searching, finding restaurants via Google Maps, making calls, checking schedules and emails, etc. This data and status are already in the cloud. For example, a more proactive workflow: "I want Italian food, find a restaurant located between the two of us, with gluten-free options, and available for reservation at 7 PM tonight."
This is an in-depth research query that takes time to respond. For instance, let's imagine that in the future AI books flights for us; it doesn't simply execute the command "book a flight" and complete it immediately.
It needs to conduct research, find information, and then return results, and this process must go through the network and the cloud. What is the necessity for it to exist on the device? Moreover, due to hardware limitations, even streaming tokens, your phone running Llama 7B cannot match the speed of me querying from the server and sending the tokens back to the phone. Furthermore, no one wants to run Llama 7B; they want to run GPT-4.5, Claude 3 Opus, or other better models.
What users want are good models, and these models cannot run on devices. Therefore, for use cases like integrating all my data, device-side AI is hard to achieve, especially since this data is already in the cloud anyway.
How much of my data do Meta, Google, and Microsoft have? Just allow me to access all of this. Just like what Anthropic is doing, you can connect your Google Drive to Anthropic. Even if my data is not with Anthropic, as long as I authorize it, they can still access it. So from the perspective of use cases, what are the real benefits of device-side AI? There are indeed security aspects, but what about practical use cases?
Matthew Berman:
Yes, I think there may be reasons to balance both. From the overall workload perspective, it may lean towards cloud computing, but I think there are also reasons to handle part of the workload on the device side. Any operation that interacts directly with the device, such as text pre-input, makes a lot of sense.
Dylan Patel:
I do believe AI will be applied on devices, but it will only be low-value AI because its cost structure must be very low. I think consumers will not pay for AI hardware on their phones because that would make the phones more expensive. If you plan to keep the phone price unchanged while adding AI features, that's fine; but if you have to raise the price, consumers will not accept it.
The scenarios where device-side AI really plays a role will be on wearable devices, such as headphones or smart glasses. What you do locally are small, fragmented tasks, such as image recognition and hand tracking, but the actual reasoning and thinking are done in the cloud This is, to some extent, also the model advocated by many wearable devices of this kind.
I believe there will be some AI on the devices, and major companies will also try it out. However, the features that truly drive user adoption, increase revenue, and improve customer lives will tend to lean towards the cloud, which is also the reason Apple is adopting its current strategy. Apple is building several large data centers, purchasing hundreds of thousands of Mac chips to deploy in them, and has hired the head of Google TPU rack architecture to create accelerators. They themselves believe that the cloud is the direction for AI development, but they also have to make efforts on the device side. However, Apple itself, although it won't say it outright, also hopes to run many of its businesses in the cloud.
NVIDIA vs. AMD: Who Will Prevail?
Matthew Berman:
They do have great chips. Let's talk about chips and compare NVIDIA with AMD. Recently, I read a few articles from SemiAnalysis, which argue that AMD's new chips are actually very powerful. What do you think? Is this really enough to challenge the moat of CUDA? Will they start to take market share from NVIDIA?
Dylan Patel:
I think it's the result of multiple factors working together. AMD is indeed working hard, but their hardware is lagging in some aspects, especially compared to Blackwell. The real challenge they face is software; the developer experience is not great. The situation is improving, and we have provided them with a long list of suggestions to change the status quo, such as specific fixes and changes to CI resources, etc. We provided suggestions in December and recently, and they have implemented quite a few of them, but in terms of software, it still feels far behind. As for market share, I think they will gain some. They had a certain share last year, and they will have some this year as well. The challenge is that, objectively, AMD's chips are worse compared to NVIDIA's Blackwell architecture.
Matthew Berman:
Oh, you mean purely the chips, not the ecosystem.
Dylan Patel:
Yes, because of the system. NVIDIA can connect GPUs through the NVLink network hardware on the chips. The way NVIDIA builds servers allows 72 GPUs to work very closely together, while AMD can currently only manage 8.
Dylan Patel:
This is crucial for inference and training. Secondly, there’s NVIDIA's software stack, which is not just CUDA. While people often say "it's just CUDA," in reality, most researchers do not directly interact with CUDA.
Dylan Patel:
They call PyTorch, which then calls CUDA, allowing it to run automatically on the hardware. Whether through compilers or just-in-time mode, it usually adapts very well to NVIDIA hardware. The calling process is similar on AMD as well. And now, many people don't even directly interact with PyTorch anymore Dylan Patel:
They will use inference libraries like vLLM or SGLang to download model weights from Hugging Face or elsewhere.
Dylan Patel:
They connect the model weights to an inference engine (such as the open-source projects SGLang or vLLM on GitHub) and then just run it. These engines call various libraries like Torch compilation, CUDA, and Triton at the underlying level, forming a complete call stack.
Dylan Patel:
In fact, end users just want to use a model to generate tokens. Libraries like Dynamo built by NVIDIA make this process very easy.
Dylan Patel:
Clearly, developers are at different levels; some are at the application layer, while others delve deeper into the underlying layers. But many users just want to call open-source libraries and tell the program, "Here are my model weights, here is the hardware, run it." In this regard, AMD is indeed making efforts, but the user experience is still poor. It's not that the program can't run, but for example, when using a certain library, NVIDIA might require setting 10 parameters.
Dylan Patel:
Using AMD might require setting 50 parameters. And each parameter has different settings, making it very difficult to achieve optimal performance. However, I believe AMD is catching up.
Dylan Patel:
They are making rapid progress and will capture a certain market share. On the other hand, some of NVIDIA's practices are also detrimental to itself. In the cloud service ecosystem, there are large companies like Google, Amazon, and Microsoft.
Dylan Patel: 43:22 These large companies have been developing their own AI chips, creating competition with NVIDIA. In response, NVIDIA has prioritized partnerships with other cloud companies like CoreWeave and Oracle. In fact, there are over 50 such companies, including Nebius, Together, and Lambda. NVIDIA is reallocating resources that might have been assigned to Amazon and Google to prioritize these emerging cloud companies.
Matthew Berman:
Is this considered a gesture of goodwill?
Dylan Patel:
Yes. Just look at Amazon's profit margin on GPUs; if you rent a GPU directly, they charge about $6 per hour.
Dylan Patel:
The cost of deploying NVIDIA GPUs in data centers is about $1.40 per hour. A reasonable profit for cloud services might be $1.75 or $2 per hour. This is what NVIDIA wants to see; they don't want all the profits to go to cloud service providers.
Dylan Patel:
And on Amazon, the price is $6. Of course, you can negotiate with Amazon for a lower price, but it's not easy.
Dylan Patel: So, NVIDIA is lowering prices by supporting these different cloud companies. But in my opinion, they made a significant mistake recently: acquiring Lepton. Lepton does not own data centers itself, but it develops a cloud software layer responsible for reliability, ease of operation, and all scheduling-related tasks like Slurm and Kubernetes. This was supposed to be the domain of large cloud companies and those referred to as "neo-clouds."
Dylan Patel:
Now NVIDIA has acquired this company that develops software layers and is working on a product called DGX Cloud. This means that if any cloud service provider has idle GPUs, they can hand them over to NVIDIA as bare metal, and NVIDIA will deploy Lepton's software on them and rent them out to users. This has made cloud service providers very angry because NVIDIA is competing directly with them. In fact, NVIDIA may also connect some of its own GPU resources to the DGX Cloud platform.
Dylan Patel:
It's like you supported us, and now you're building a platform to compete with us. So many cloud service providers are very upset, but they dare not publicly express their dissatisfaction with NVIDIA because NVIDIA's position is too important, just like you wouldn't provoke God. The saying goes, "What Jensen Huang gives, Jensen Huang can take back."
Dylan Patel:
However, they will privately complain to us (analysts). As a result, some cloud service companies have started turning to AMD, partly because AMD may have offered them incentives, and partly because they are dissatisfied with NVIDIA. There are indeed some cloud companies now purchasing AMD GPUs. Additionally, AMD is doing a third thing: they are engaging in the kind of activities that NVIDIA has been accused of. I don't know if you're aware of the accusations against NVIDIA regarding the transaction model with CoreWeave.
Matthew Berman:
Yes, they are accused of transferring revenue back and forth, constituting fraud.
Dylan Patel:
Yes, NVIDIA invests in them and then rents clusters from them.
Matthew Berman:
Yes, it seems like a regular operation.
Dylan Patel:
CoreWeave uses investment funds to purchase GPUs, and they also have to develop their own software.
Matthew Berman:
There seems to be something worth scrutinizing here, but that's correct.
Dylan Patel:
Anyway, AMD is actually doing something similar, and even accelerating it. They are selling GPUs to companies like Oracle, Amazon, Crusoe, Digital Ocean, TensorWave, and then renting back computing power from these companies. This "sell and lease back" model is different from CoreWeave purchasing NVIDIA GPUs and renting a small portion of the computing power back to NVIDIA while selling the vast majority of the computing power to Microsoft Matthew Berman:
To break the deadlock. But isn't this considered accounting fraud?
Dylan Patel:
Not at all. From an accounting perspective, it's completely legal. Selling products to others and then renting back services from them is not a problem in itself. Just like NVIDIA...
Matthew Berman:
NVIDIA has done it too. They were almost funding each other for this investment.
Dylan Patel:
Exactly. For companies like Oracle and Amazon, AMD's pitch is: "Buy our GPUs, and we'll rent back some of the computing power. You can keep the rest and try to rent it to your own customers. This can spark market interest, and if it works well, you can buy more." And for those emerging cloud companies that only purchase NVIDIA products, their message is: "Why not buy our products? We'll sign a contract to give you peace of mind, and you can rent some of the computing power to others." To some extent, this makes sense. But on the other hand, it also looks like a lot of the sales are actually just AMD repurchasing the computing power. But this has indeed fostered a very good cooperative relationship.
Dylan Patel:
Now, cloud companies like TensorWave and Crusoe are expressing their fondness for AMD. Because AMD sells GPUs to them first and then rents back the computing power, allowing them to profit from it. They can reinvest that money into more AMD GPUs or rent out the surplus GPUs to others. Meanwhile, they feel that NVIDIA just wants to compete with them, so what can they do? This creates an interesting situation. I think AMD will perform well; although market share won't surge, they will sell chips worth billions of dollars.
Matthew Berman:
But if you were to provide investment advice to a company, which chip would you recommend they invest in for the foreseeable future? Still NVIDIA?
Dylan Patel:
It depends on what price you can get from AMD. I believe there is a price point at which using AMD makes sense, and AMD does sometimes offer such prices. Meta has used quite a bit of AMD products, and of course, they also use a lot of NVIDIA. In certain specific workloads, if you have enough software talent and AMD offers a very low price, then choosing AMD makes sense. That's why Meta does it.
Grok and xAI
Matthew Berman:
I want to talk about the topic of explainable artificial intelligence (XAI) and Grok 3.5. Clearly, there isn't much public information about it right now. Elon Musk has stated that this is the smartest AI on Earth and will operate based on first principles.
Matthew Berman:
Is all of this just hype? Have they really discovered something new and unique? Especially the kind of "controversial but true" facts he mentioned. His behavior either indicates he has made a new discovery or is purely self-aggrandizing What are your thoughts on the current situation?
Dylan Patel:
Elon is an excellent engineering manager and also a great marketing expert. I don't know what the new model will be like, but I've heard it's good; everyone says so. Let's see how it turns out. I was very surprised when Grok 3 was released because it was actually better than I expected.
Matthew Berman:
Grok 3.
Dylan Patel:
I don't use it daily, but I do use it for certain queries.
Matthew Berman:
If you don't mind me asking, what kind of queries?
Dylan Patel:
Its deep research capabilities are much faster than OpenAI's, so I use it sometimes. Also, sometimes other models can be quite "shy" when providing the data I want. Personally, I'm very interested in human geography, such as human history, geography, politics, and the interactions between resources, so I also want to learn about demographic data, which is fascinating.
For example, the small town I grew up in is located in the "Bible Belt," with a population of about ten thousand, half Black and half White. When I explain this demographic composition to others, I mention that the area is a floodplain left by the retreating ocean, making the land extremely fertile. In Georgia, when early settlers came here, they occupied these fertile areas and had better harvests, which allowed them to afford slaves. That's why the local Black population is much higher than in most parts of the state.
While this explanation may seem a bit "out there," I enjoy thinking about these kinds of human geography issues, and Grok doesn't shy away from these topics. It allows me to think deeply. Although discussing certain topics may not be appropriate from a "tastefulness" perspective, it helps to understand history. For example, some invasions in European history were not fundamentally due to an aggressive nature but were driven by increasing aridity in their homeland, forcing them to leave. Understanding such matters, or the business history of how Standard Oil overcame competitors before becoming a monopoly, is very interesting. Other models might label discussions about Standard Oil with terms like "breaking unions," while I just want to know what the facts are.
So, Grok sometimes solves my problems, but it's not the best model. The one I use the most in my daily work is Claude 3 Opus or the Claude model.
Matthew Berman:
You use Claude 3 Opus every day, even though its response speed is slow?
Dylan Patel:
It depends on the topic. Many times I can accept the wait, but there are also many times when I can't, and that's when I use Claude. I use Gemini in my work, and we use it to handle a lot of licensing and regulatory filing documents. We do a lot of long context operations. It excels at handling long contexts, document analysis, and retrieval, so we use Gemini to manage a lot of tasks at work But if it's a daily scenario, like taking out your phone to check something in the middle of a conversation, the situation is different.
Matthew Berman:
Okay, let's get back to Grok.
Dylan Patel:
Yes, regarding Grok, they have a massive amount of computing resources, and it's very centralized. They have many excellent researchers, with about 200,000 GPUs in operation, and they have purchased a new factory in Memphis and are building a new data center. Some of their methods for acquiring computing power are quite crazy, such as using mobile generators. They just bought a power plant overseas to ship to the U.S. because they couldn't buy new ones in time.
They are doing all sorts of crazy things to acquire computing resources. They have excellent researchers, the model itself is quite good, and Elon is heavily promoting it. Will it be great, or just okay? Will it be on par with competitors, or will it fall short? I don't know.
Matthew Berman:
Is there a fundamental difference in what they are doing? He specifically mentioned rewriting the human knowledge corpus because there is too much useless information in the current foundational models. He clearly has a grasp of X's data.
Dylan Patel:
Its quality is also very low, making it difficult to handle.
At the same time, this is another area where I sometimes use Grok: current events.
Matthew Berman:
Yes, for summarization.
Dylan Patel:
For example, the situation happening in Israel and Iran and all the war-related matters. You can ask Grok, and it can tell you what happened more accurately than Google Search, or even Gemini or OpenAI's queries, because it can access all that information.
Matthew Berman:
Is there anything different in what they are doing? I'm referring to that kind of step function difference.
Dylan Patel:
I view the step function from different angles. Everyone likes to feel they are doing something different, but overall, everyone is doing the same thing. They are pre-training large Transformer models and then doing reinforcement learning on top of that, mainly in verifiable domains, but they are also researching how to conduct related work in unverifiable domains. They are creating environments for the models to operate in, but these environments are mostly code and mathematics. Now they are starting to touch on computer usage and other aspects.
It feels like everyone is doing roughly the same thing, but this is also a highly challenging problem with many directions to tackle. But I think overall, everyone is taking the same approach. Even SSI is not. SSI is doing something different, but I think what they are doing is not much different from what I just described.
AI and Employment: Challenges for White-Collar Jobs and the Future of the Labor Market
Matthew Berman: I have two topics I want to discuss. The first is about economics and labor studies, and I want to talk about the potential disappearance of 50% of white-collar jobs. I know you might have read related content as well. This is obviously a recent development and is more concerning for you. Which one do you prefer to discuss?
Dylan Patel:
Let's start with the first one, even though the second might be more interesting for your audience. The first topic is indeed fascinating. Everyone, or at least some people in the AI field, are worried about mass unemployment. But on the other hand, our population is aging rapidly, and overall, people are working less than ever before. We used to mock Europeans for working fewer hours, but in reality, the average working hours were much higher 50 years ago, and even longer 100 years ago, when people had almost no leisure time. Now, everyone has larger living spaces, and food security is better. We can say that we are much better off on every metric compared to 50 or 100 years ago.
AI should ideally reduce our working hours. In the future, there may be a situation where people like me (and possibly you) are overworked, while the average person's working hours are much less. Clearly, resource allocation is a challenge, and I think that is the crux of the issue. This is also why I am very excited about robotics. Many jobs that are easy to automate are precisely those that we find hardest to replace with robotics. People often think they want to sit at a computer and be creative, but in fact, one of the markets most affected is freelance graphic design, while physical labor markets, like fruit picking, have yet to be impacted. And that (physical labor) is the kind of work people do not want to do.
Matthew Berman:
That makes sense. Although robotics is advancing at an astonishing pace, this part of automation still seems far off. So, do you foresee that with the tremendous increase in human productivity, a large number of tasks will be automated? Do you think the future role of humans will be to manage AI, review AI outputs, or a combination of both?
Dylan Patel:
We are transitioning from using chat-based models to handling longer-term tasks. For example, in deep research, tasks that used to take several minutes or even tens of minutes to complete can now be done much faster with AI.
Dylan Patel:
In the future, there will be an AI assistant that we can talk to continuously, or it will proactively prompt us to pay attention to certain things. At the same time, there will be long-term tasks where AI will work continuously for hours or even days, and then present the results for our review. Ultimately, human involvement in this process will no longer be necessary.
Matthew Berman:
I believe that. What timeline do you envision for this?
Dylan Patel:
I tend to be quite pessimistic about timelines. I don't think that 20% of jobs will be automated within the next decade. I feel it might take until the end of this decade or the beginning of the next to achieve 20% job automation. While some say that artificial general intelligence (AGI) will emerge in 2027, it depends on how they define it Matthew Berman:
Even if their predictions are accurate, it doesn't mean that the technology can be implemented at that moment, right? We still need several years to truly deploy it in practice.
Dylan Patel:
I think deployment will be very fast. You can already see that the market for junior software engineers has been hit hard, and graduates are having a hard time finding jobs, while the use of AI in software development is rising sharply. We are still only at the code assistance stage, not even at the stage of automated software development.
Matthew Berman:
So, with the help of AI, will companies choose to do more and solve more problems? If so, how should those junior engineers initially enter the industry? I talked to Aaron Levie about this yesterday, and his response was, "Yes, if a team tells me their productivity has become very high, I would certainly invest more money in that team to grow that team." So, where is the development space for junior engineers?
Dylan Patel:
Yes, I agree with this point. For example, in my own company, we can do more with the help of AI, which makes us more productive and allows us to surpass those traditional companies in consulting and data that do not use AI. My company's size increased from two people to three last year, but how many junior software developers did we hire? The answer is none. One of my junior developers, we were celebrating last week because she submitted about 50 code contributions by herself.
Dylan Patel:
In the past, it would have taken more people to accomplish the same work. While there is clearly still a lot of software waiting for us to develop, the question is how many people can we actually hire? Wouldn't I prefer to have a senior engineer directing a group of AIs rather than hiring a junior engineer? This is indeed a challenge. Of course, there are benefits to hiring young people because they can quickly adapt to new AI tools. It requires a balance.
Dylan Patel:
I don't know where the exit is for junior software developers. There are always people sending me job postings on Twitter and LinkedIn, but I actually don't need them that much. I rarely see large tech companies hiring junior software developers, which is a fact and a reason why the market is so bad.
Matthew Berman:
So they can only improve themselves and master better skills.
Dylan Patel:
Yes, they need to be able to work independently and prove to the outside world that they are not just juniors, but experts who can truly utilize these tools.
Matthew Berman:
But that's not suitable for everyone.
Dylan Patel:
Indeed, it's not suitable. Many people just need a job; they may not necessarily have strong self-motivation Matthew Berman:
They definitely don't want to be founders, nor do they want to be solo developers. Even if they are not founders, they still need direction.
Dylan Patel:
I faced a problem when I started hiring: some people need a lot of guidance, which I can't provide. What I need are self-driven individuals. There are some people in the company who can do this, but providing guidance to employees is indeed very difficult because some need at least clear direction and hands-on teaching.
Matthew Berman:
Why did the open-source model initially lead over the closed-source model?
Dylan Patel:
Unless Meta makes significant improvements (which they are doing), the U.S. will lose a powerful open-source project. Sam Altman believes that Meta hasn't attracted top researchers, and I think that view is incorrect. I believe there are some top researchers who will go there, maybe not the first-tier candidates, but still top talent. The reason China is pursuing open source is simply that they are temporarily behind. Once they gain the lead, they will stop open sourcing. Ultimately, the closed-source model will prevail. It's unfortunate, but closed source will win. My only hope is that the future will not be dominated by two or three closed-source AI models or companies controlling global GDP. The market landscape may be more decentralized than that, but it doesn't have to be.
Who will win the superintelligence race?
Matthew Berman:
Meta, Google, OpenAI, Microsoft, Tesla, and other companies. You have to pick one company and bet that it will achieve superintelligence first. Who would you choose and why?
Dylan Patel:
OpenAI. They are always the first to achieve every major breakthrough, even in reasoning. I believe that reasoning alone won't take us to the next generation, so there will definitely be other human-related factors. As for second and third place, that's a tough question.
Matthew Berman:
However, they are indeed very conservative, especially in terms of releasing content, publishing results, and focusing on priorities. Their safety is extremely high, for which I am grateful.
Dylan Patel:
But this conservative style has weakened a lot. I think they are not as conservative as they used to be. From what I know, the process of launching GPT-4 was much easier than launching GPT-3. This may be because they are hiring a lot of compliance personnel, or it could be that they realize that since others will release related content anyway, they should also release their own version. However, I think they just have very talented people.
As for the third position, it's currently hard to distinguish between Google, XAI, X, and Meta. But I believe Meta will attract enough top talent to become truly competitive.
Matthew Berman: Thank you very much for chatting with me. This conversation was great and very interesting