Goldman Sachs Silicon Valley AI Research Tour: The underlying models do not widen the gap, AI competition shifts to the "application layer," and "inference" leads to a surge in GPU demand

Goldman Sachs research shows that as open-source and closed-source foundational models gradually converge in performance, pure model capability is no longer a decisive moat. How AI-native applications establish a moat has become key; inference models represented by OpenAI o3 and Gemini 2.5 Pro are becoming the new frontier in the AI field. This shift in computational paradigm has directly led to a 20-fold increase in GPU demand, and capital expenditures on AI infrastructure may remain high

Author: Li Xiaoyin

Source: Hard AI

From August 19 to 20, the Goldman Sachs analyst team completed the second Silicon Valley AI field research, visiting leading AI companies such as Glean, Hebbia, Tera AI, as well as top venture capital firms like Lightspeed Ventures, Kleiner Perkins, and Andreessen Horowitz, and engaging in in-depth discussions with professors from Stanford University and the University of California, Berkeley.

The research shows that as open-source and closed-source foundational models rapidly converge in performance, pure model capability is no longer a decisive moat. The focus of competition is shifting entirely from the infrastructure layer to the application layer, with the real barrier being the ability to deeply integrate AI into specific workflows, utilize proprietary data for reinforcement learning, and establish a robust user ecosystem.

The report also cites views from top venture capital firms like Andreessen Horowitz, stating that open-source foundational models have caught up with closed-source models in performance by mid-2024, reaching GPT-4 levels, while top closed-source models have seen almost no breakthrough progress in benchmark testing.

At the same time, inference models represented by OpenAI o3 and Gemini 2.5 Pro are becoming the new frontier of generative AI, with their single-query generated output tokens reaching up to 20 times that of traditional models, driving a 20-fold increase in GPU demand, and supporting AI infrastructure capital expenditures to remain high in the foreseeable future.

Foundational Model Performance Convergence, Competition Focus Shifts to Application Layer

Goldman Sachs' research clearly points out that the "arms race" in the AI field is no longer solely centered around foundational models.

Several venture capitalists indicated that the performance of foundational models is increasingly commoditized, and competitive advantages are shifting upstream, focusing on data assets, workflow integration, and fine-tuning capabilities in specific domains.

Guido Appenzeller, a partner at Andreessen Horowitz, mentioned in discussions that the performance gap between open-source large models and closed-source models has been closed in less than twelve months, reflecting the astonishing development speed of the open-source community. Meanwhile, the performance of top closed-source models has stagnated since the release of GPT-4.

In this context, how AI-native applications establish moats becomes crucial.

AI startup Hebbia believes that the true barrier to applications is not the technology itself—top engineering teams can replicate any technology within 6 to 8 months—but rather the cultivation of user habits and the establishment of distribution channels. Its logic is similar to the success of Excel: by deeply embedding into workflows and cultivating "super users," it creates an irreplaceable network effect.

Companies like Everlaw also emphasize that by deeply integrating AI into the document processing workflows of legal cases, they provide users with integrated convenience and efficiency that independent AI models cannot match.

It is noteworthy that leading AI laboratories themselves are also aware of this shift. The report states that institutions like OpenAI, Anthropic, and Google DeepMind are increasingly venturing into the application layer, leveraging their insights into model architectures and development roadmaps to build tighter product feedback and reinforcement learning loops. This has brought new competitive pressure to independent startups.

Reasoning Models Become New Frontier, Igniting GPU Demand

According to the report, in the past three years, the cost of running a model that achieves a constant MMLU benchmark score has dropped from $60 per million tokens to $0.006, a decrease of up to 1,000 times— despite the sharp decline in the unit operating cost of large models. This does not mean that overall computing power expenditure will decrease.

Multiple VCs pointed out that new demand growth points are rapidly emerging. Research has found that following the breakthrough of DeepSeek R-1, a new generation of reasoning models represented by OpenAI o3, Gemini 2.5 Pro, and Claude 4 Opus has emerged, marking a fundamental shift in foundational models.

Traditional large models mainly restate memorized answers, while reasoning models simulate the thinking process through deduction, verification, and iteration. This results in the output text length of the latter reaching up to 10,000 tokens, while traditional LLMs typically hover around 500 tokens. The 20-fold increase in output tokens directly translates into a 20-fold demand for GPU inference computing power.

Goldman Sachs pointed out that this shift makes reasoning costs high, but it also allows AI to be applied more accurately in complex fields that require rigorous analysis, such as code synthesis, law, finance, and engineering. Therefore, VCs generally believe that the current high capital expenditure on AI infrastructure is "appropriate and necessary," and it is not a threat to profits but a prerequisite for gaining competitive advantage, especially for leading AI laboratories.

AI Native Application Moat: Workflow, Data, and Talent

As models themselves are no longer a scarce resource, successful AI application companies are building barriers in other ways. Goldman Sachs' research summarized the following key features:

First is workflow integration and user ecosystem. Successful application companies can quickly create value for enterprises, reducing deployment time from months to weeks.

For example, the customer service AI company Decagon can help clients launch automated customer service systems within six weeks, saving $3 to $5 million for every $1 million invested. This seamless integration with existing business processes is key.

Second is proprietary data and reinforcement learning. The report points out that static "walled garden" proprietary datasets hold immense value in vertical fields such as law and finance.

However, more valuable than static data is dynamic user-generated data, which can power reinforcement learning loops. Companies that can acquire scaled users early can leverage high-quality user feedback signals to continuously optimize models, thus forming a snowballing leading advantage.

Third is the strategic value of specialized talent. Unlike the previous wave of SaaS, the success of generative AI applications heavily relies on top engineering talent. Building efficient AI systems requires specialized skills such as model encapsulation, agent reasoning, and reinforcement learning loop design.

VCs believe that AI talent capable of building self-improving systems is extremely scarce and has become the main bottleneck for sustainable innovation.