
Google's token consumption has increased 50 times in a year, now 6 times that of Microsoft/ChatGPT

The inference cost is significantly lower than expected
Today, Barclays published a comparison of token consumption for inference traffic between Google and Microsoft, which is very interesting and even somewhat surprising for investors. The "free" model has led to a surge in demand, and the inference costs are significantly lower than expected.
1. Absolute Leading Scale in AI Inference
In Q1 2025, Alphabet processed approximately 634 trillion (634T) tokens, while Microsoft processed about 100T; by April 2025, the monthly inference volume had risen to 480T, a staggering 50-fold increase from 9.7T a year ago, demonstrating that Google has a scale advantage in AI inference traffic that is six times that of Azure/ChatGPT.
2. Growth Driver — Free AI Products Instead of Paid Subscriptions
The surge in inference volume primarily comes from free scenarios like AI Overviews in Search; the user base of Google Search is about 5-6 times that of ChatGPT, and the growth rate of free AI tokens (50×) far exceeds that of paid large model revenue growth (3-4×), highlighting Google's strategy of prioritizing users and data barriers before seeking monetization.
3. Cost Impact Overestimated: Inference Costs Only Account for Approximately 1% of Search Revenue
Based on the Gemini 2.5 rate, the inference cost in Q1 2025 is estimated to be around $750 million, accounting for only approximately 1% of search revenue (1.6% of COGS + Opex); even if token consumption continues to rise at a rate of 4 times, it will still be lower than the core infrastructure costs of search (approximately 18% of revenue), alleviating market concerns about declining profit margins.
4. Capital Expenditure Structure: Primarily Training, Inference Only Uses Approximately 10%
Approximately 90% of Google's AI computing CAPEX is still directed towards training and new models, with the CAPEX for inference chips accounting for only 6.2% (approximately $600 million) in Q1 2025. If estimated based on a monthly run-rate of 480T, the inference CAPEX in Q2 2025 will only rise to approximately 14%, indicating that funding is still primarily betting on the long-term evolution of models.
5. Hardware Efficiency: Approximately 270,000 TPUv6 can support current inference
Assuming a 50/50 Pro and Flash model with 15% Active Parameters, Google can cover the 1Q25 inference load with approximately 270k TPUv6 (ASP approximately $4,500), reflecting the advantages of self-developed accelerators in terms of power consumption and cost.
6. Future Catalysts: Astra, Mariner, Veo and other agent-based applications
The yet-to-be-fully-scaled general AI assistant (Project Astra), browser agent (Mariner), and video generation model (Veo) are expected to further increase the token base. Google has proactively raised Compute CAPEX to over 50% of total CAPEX, laying the groundwork for incremental elasticity in AI demand after 2026.