
Microsoft CTO: AI has already reached "overcapacity," and the industry needs to work to narrow the gap between model capabilities and actual product delivery

"AI programming" is just a change in the tools of the software industry. When tools change, it is important to maintain an open mindset. Innovation in the AI era does not rely on creating a completely new infrastructure; the real way to innovate is to understand a specific user problem more deeply than anyone else, and then, based on existing infrastructure, make fine-tuning to solve that problem at a world-class level. Next year, we may see AI agents executing more complex interactive solutions and engaging in "asynchronous interactions."
Microsoft CTO Kevin Scott recently shared his views in a media interview on a series of issues including AI agents and the future of programming.
In Scott's view, the reasoning capabilities of models have already outpaced the ways in which these models are currently applied. The entire industry now needs to work together to bridge the gap between what models can actually do and the products delivered to users.
To make agents truly useful, current AI needs better agent memory systems (to handle more complex problems), as well as an ecosystem that should function like the internet (to access information).
"AI agent programming" is not the first time software development has undergone a significant transformation in the past forty years. The focus is not on how to do it, but on the goals to be achieved. When tools change, one must maintain an open mindset.
Here are the highlights of the interview:
The reasoning capabilities of models have already outpaced the ways we actually apply these models. The entire industry now needs to work together to bridge the gap between what models can actually do and the products we deliver to users.
In addition to "reasoning capabilities," there are many other issues that need to be addressed to make agents truly useful. This means we need better agent memory systems, as well as an ecosystem that should function like the internet.
If you really think about what agents can do and how useful ordinary users want them to be—you will find that we need a series of transformations to occur again, similar to when the internet first emerged. The prototype of this scene, such as the MCP protocol, is a very good example.
"AI agent programming" is not the first time software development has undergone a significant transformation in the past forty years. Whether it is software or other things, the focus is not on how to do it, but on the goals I want to achieve. So I will choose the most powerful and convenient way to achieve it. When tools change, one must maintain an open mindset.
I spent almost as much time as a carpenter as I did writing programs. When I was a teenager, the biggest topic in the circle was: If you use power tools, are you still a real carpenter? A real carpenter only uses hand tools! This debate still exists today, but it has changed to: If you use CNC (computer-controlled tools), are you still a real carpenter?
The most critical difference now lies in the mindset of product designers. Some startups today are not innovating by creating a brand new infrastructure; their way of innovating is that their understanding of a user problem is deeper than anyone else's. Then, based on existing infrastructure, or with slight adjustments, they can solve that problem at a world-class level. This approach is what we truly need now.
I believe that in the future, we will see that the problems people use agents to solve will become increasingly complex and ambitious. At the same time, the "agent network" will become more complete, with better connections; the reasoning and planning capabilities of models will also become stronger. This will drive us from the current "synchronous interaction" model into a stronger "asynchronous interaction" era The following is the full transcript of the conversation:
Host:
Kevin, welcome to our show.
Kevin Scott:
Thank you.
Host:
Thank you for coming. One interesting thing is that I interviewed you last year. You mentioned two very important things at that time. One is that agents will be everywhere.
Host:
What you said has really come true, and it has happened very quickly. Another thing I noticed is that you particularly emphasized "scaling laws" last year, right?
Host:
At that time, you showed many charts, saying that we are building large-scale infrastructure, training larger models, and that there will be a leap in performance every two years. But this year, your focus seems to be more on the "Agentic Web." What has changed? What have we learned from last year to this year?
Kevin Scott:
Yes, I think a lot has changed. One thing is that last year many people were still in a state of skepticism, doubting whether "scaling laws" could continue to be effective. In fact, we have proven year after year that they are still effective and working well. So there is no longer a need to keep repeating this to people.
Kevin Scott:
Another thing, to be honest, is that the reasoning capabilities of the models have outpaced the ways we are actually applying these models. I have been talking about a concept recently called "capability overhang."
I believe that our entire industry now needs to work together to bridge the gap between what the models can actually do and the products we deliver to users. This is also one of the reasons why "scaling laws" are not as appealing at this year's Build conference as they were last year.
Kevin Scott:
Another point we have discovered is that with the explosive growth of agents over the past year and the increase in user time spent with these agents, we realize that in addition to "reasoning capabilities," there are many other issues that need to be addressed to make agents truly useful. This is also one of the key points I mentioned in my keynote speech at the Build conference today: we need better agent memory systems.
Kevin Scott:
Current agent memory is limited in many ways; they are more like one-time, transactional— you use them to complete a task, and during that time, the memory is coherent, but this memory is likely to completely disappear in the next interaction, making it difficult for us to delegate more complex tasks to them.
Kevin Scott:
And there is a core issue: if agents are to become useful, they must be able to take action on your behalf, use tools, make changes in the system, and access a rich variety of information sources
To achieve this, we need an ecosystem that should be like the internet: If you are an information source, you already have a website and an API, then you must figure out how to connect these resources so that agents can communicate with them, and ensure that the incentive mechanisms of all parties are aligned, making them willing to participate in this "agent network."
Kevin Scott:
So I think this is the biggest story of the year — we are seeing the dawn of real progress, such as super simple, open protocols like MCP, which play a role in the agent network similar to that of HTTP in the internet. There are also standards like NL Web, which play a role in the "agent network" akin to that of HTML in the web world.
Kevin Scott:
I think you will see these systems increasingly adopting simple, composable, and stackable structures, with open communities being very active, ultimately driving the real capabilities of agents to be realized.
Host:
So let me summarize what I heard: now we have agents, and they are starting to really work, right? And to make these agents powerful, they need access.
Host:
They need to be able to access various resources on the internet, the content on your computer, and other similar information. In other words, you basically need to establish a set of protocols and processes to allow agents to access these things, right?
Host:
So now you are focusing on the different layers of the entire tech stack — for example, at the runtime level, you are building memory systems and other components there; then there are protocols like MCP (memory coordination protocol) that can connect agents to the broader internet world to obtain information and allow information to flow into the agent system.
Host:
So I want to ask, why is this important for Microsoft? What role do you hope to play in this ecosystem?
Kevin Scott:
Well, I think there are two, maybe three particularly important points.
The first point is that we are acting as agents ourselves. And for these agents we create to be truly useful to users, we must solve these underlying issues. Even if you narrow it down to enterprise-level agents, as Microsoft's CTO, one thing I have been advocating is: I hope all internal systems in our company adopt a unified standard protocol that can communicate with the agents we build internally.
Kevin Scott:
This way, we can avoid exposing the entire world to the so-called "Conway's Law." You know, Conway's Law is a very interesting phenomenon in software architecture.
Conway stated that the structure of a system often reflects the organizational structure of the team that developed it, for example, the number of stages in a compiler is usually determined by the number of teams responsible for those parts
Host:
That's right.
Kevin Scott:
So imagine, if you are developing something within a large company like Microsoft, you definitely wouldn't want the agents you create to be structured entirely according to your organizational structure.
But in reality, if you don't have universal protocols and standard services, such "organizational chart products" will keep appearing. As an engineer, seeing that kind of inefficient development scenario is really frustrating.
Kevin Scott:
But I think more importantly, if you really imagine what agents can do, how useful ordinary users want them to become—you will find that we need a series of transformations to happen again, just like when the internet emerged. I can already see the prototype of this scene, such as the MCP protocol, which is a very good example.
Kevin Scott:
It is a very simple yet crucial protocol that addresses a very important issue—not only serving those who build agents and platform infrastructure but also helping the end users of the system, making their experience more useful. It also provides opportunities for service providers, as someone might say, "I also want to participate in this new type of large network."
But the problem now is that many people used to know how to connect to a service, how to build a service, but now they are faced with a group of agents, sitting there thinking, "How do I connect my system? What does this mean for me?"
Kevin Scott:
Even from a business model perspective, they will also wonder: Why should I connect to this system? What value does it bring to me?
So the second point is—we want to make the agents we build more useful.
Kevin Scott:
The third point is, as a platform company, this is even more important than the agents we write ourselves. Microsoft has been deeply engaged in building platform technology for fifty years, and we want to ensure that when this brand new "super network" emerges, we can help solve the problems that arise within it.
Host:
Yes, seeing how much you are investing in MCP now and integrating it into the entire Windows system is really cool and impressive. This brings to mind a question—I recently heard some people discussing MCP, and they started to focus on its security model issues.
Host:
I'm curious about your perspective on this issue. Because you mentioned earlier many analogies between the MCP tech stack and the internet tech stack. And we know that the internet has a complete set of security mechanisms, such as the Same-Origin Policy, which ensures that websites can only operate on data under their own domain when executing code, right? But MCP currently seems to lack a similar mechanism. So what kind of security model do you think is suitable for MCP?
Kevin Scott:
Well, to be honest, I can't say I fully know what the "correct" security model is. But one interesting thing about MCP is that its design is extremely simple and clear, which actually allows the entire community to relatively easily reach a consensus on this issue.
Kevin Scott:
We do have many important needs at the enterprise level, and we are working well with the MCP team to advance related work.
Kevin Scott:
For example, we need to give agents an "identity"—this way we can establish a permission system. You can define: a certain agent operates on behalf of a certain user, and then it has the right to access certain resources in the system.
Kevin Scott:
The agent itself can even actively query multiple systems and say, "This is something I want to accomplish, and to achieve this, I need to access the following systems. What permissions do I need to do this?"
Kevin Scott:
It can request authorization from the user it is delegated to, saying, "Can you give me permission to access these resources so that I can complete the task you assigned to me?" Yes or no.
Then the system administrator also needs to have the authority to review, for example, "Do I allow these operations to happen?" So, while this entire process is not "simple," it is actually feasible and logically clear to implement on the MCP architecture.
Kevin Scott:
The key is: we should do this in an open manner. We do not want these mechanisms to be exclusive to Microsoft agents or Microsoft systems—we really need it to operate like an ecosystem on the internet.
Host:
For me, this is actually a very interesting question. I feel that there are two possible models or "Go-to-Market" paths emerging around the development of AI, and it seems that you at Microsoft are paying attention to both directions.
One is the so-called "vertical integration" model, where you control the model, applications, and the entire upstream and downstream—everything is in your hands.
Host:
One benefit of this model is that security can be strongly guaranteed. Just like Apple's App Store or iPhone model, you can enforce security policies at multiple levels.
But the other is the "open model"—you sacrifice some control and security, but gain stronger innovative vitality because there is no centralized authority to restrict developers.
So I want to ask, how does Microsoft think about which path to take? How do you make this decision?
Kevin Scott:
Yes, you see, this is indeed a core issue that many people are discussing now—but I think it might be a false dichotomy
You know, in these open systems, their characteristic is "permissionless." This ability for open innovation indeed brings tremendous advantages. For me personally, the most exciting thing right now is that you can innovate and build products without anyone's approval, without needing someone to give you permission, and without going through any intermediary processes to bring your work to the world.
You no longer need to go through a bunch of complex gatekeeping mechanisms that set up numerous obstacles between you, the idea person, and those who might truly benefit from it.
Kevin Scott:
I feel that the "intermediate layers" we've built over the past few years haven't really brought much value to the two core parties: one side is the people who work hard to create things, and the other side is the users who are willing to pay attention, money, or other resources for these results.
That's why I'm particularly excited about open systems, and it's one of the important reasons we make strategic choices.
Kevin Scott:
But I also believe that there are ways to achieve robust security within these systems. We can leverage some of the capabilities of AI itself to build smarter security models.
For example, the agent you run can take care of your personal security needs—what information you are willing to share and what you are not; it can also conduct risk assessments.
Let me give you a practical example: This morning, just as I was about to go on stage to speak, I suddenly received a bunch of emails because I am my wife's backup security account.
Someone was trying to tamper with the two-factor authentication (2FA) settings on her account. My first reaction was to text her instead of emailing—because I was worried that her email might have been accessed by an unauthorized third party.
Kevin Scott:
I texted her, "Are you changing the settings?" She replied, "Yes, it's me."
So you can imagine, if there were an agent that could access your various communication channels, detect such anomalous behavior, and call upon various resources for "triangulation" to determine whether these actions are legitimate or illegal, that would be very useful.
So I believe that the two models can coexist. It's not a matter of having to choose one or the other—just as you imagined earlier.
Host:
That makes a lot of sense. I have one particularly curious question—now it seems that software engineering is undergoing fundamental changes, right?
And you are a veteran who has been deeply involved in the field of software engineering for many years. I feel that you also value the "craft" itself—the art of making things.
We just talked about how you usually do pottery and make bags, enjoying the process of hands-on creation. I think many people are a bit resistant to "writing code with agents," feeling that it diminishes that "handcrafted" feeling, although I don't completely agree with that viewpoint.
But I still want to know, as someone who truly cares about the craft of programming, what do you think about the future of "agent programming"?
Kevin Scott:
Let me say this, I really appreciate "my people"—by "my people," I mean the broader community of creators.
This includes software engineers, mechanical engineers, carpenters, potters, and so on. We are all people who create new things from scratch or raw materials.
Kevin Scott:
If you truly love your work, you will have very strong opinions about how to do it, what tools to use, what materials to use, and how to combine these details. This is a prerequisite for becoming a truly excellent practitioner.
But interestingly—people's viewpoints are diverse.
As you just mentioned, I have been in this field for a long time—I wrote my first program when I was only 12 years old, which means I have been programming for 41 years.
Kevin Scott:
If you stick with a field long enough, you will see: this is not the first time software development has undergone a huge transformation in the past forty years. Every time such a transformation occurs, people have very strong reactions to its implications.
But the reality is, people have choices.
I still prefer to use a text editor. To be honest, I probably shouldn't say this because our company makes Visual Studio Code (laughs), but I am just an old-fashioned guy—I still use vim.
Kevin Scott:
At least I will use vim, but my favorite is still that old-fashioned editor. I just don't want to switch to other tools.
Even though I know that, to some extent, this has reduced my efficiency, I still insist on using it for the reason of "personal choice."
But in other projects I work on, whether software or other things, sometimes I will say: "The focus here is not on how to do it, but on the goal I want to achieve."
So I will choose the most powerful and convenient way to achieve it—regardless of whether others will laugh at me for it.
Kevin Scott:
This situation is everywhere. For example, I have been a carpenter for almost as long as I have been programming.
I still remember when I was a teenager, the biggest topic in the community was: "If you use power tools, can you still be considered a real carpenter?"
"A real carpenter only uses hand tools!"
Kevin Scott:
This debate still exists today, but it has changed to: "If you use CNC (computer-controlled tools), can you still be considered a real carpenter?"
I think this discussion itself is very interesting, but ultimately, people make different choices because their values are different.
If you value the process more, you might make completely different choices; whereas if you value the results more, you might use other methods.
Host:
I think questions like "Are you a real carpenter?" or "Are you a real programmer?" ultimately imply: "You can only be a real XXX if you do it the way I grew up doing it." This is actually a biased statement.
Kevin Scott:
Yes, that's true. But the reality is—there's too much diversity in the world, right?
So what I want to say is: I would never tell anyone not to have strong opinions about their craft. It's great that you have your convictions!
But if I have any advice (this is not a command, just something I personally find useful), it is to keep an open mind when tools change.
Kevin Scott:
I can't count how many times new technologies have appeared in other "non-software" creative fields, and I instinctively resisted at first—for example, I had no interest in 3D printers at all, and it took me a long time to learn how to use them.
Now I really regret it because they are incredibly useful for almost everything I do. For various complex reasons, I didn't allow myself to be curious, which is my own issue and indeed a bit strange.
So my advice is: stay curious and try things out. If something works for you, use it; if it doesn't, that's okay too.
Host:
Exactly. So what do you think about the future of "software engineering agents"?
Will there be a situation where "one agent rules them all"? Or will we use many agents with different styles at the same time? How do you think this ecosystem will develop?
Kevin Scott:
I believe there will definitely be many different types of agents in the future. That's a good thing.
We will certainly work very hard on GitHub Copilot and the GitHub Agent we are developing, hoping to become the preferred tool for many developers because we want it to be genuinely useful for everyone.
But to say that all developers around the world will use a single tool to accomplish key parts of their work, I think that's unrealistic.
Kevin Scott:
One of the joys of being a developer is that you have the right to choose your tools. You can try various things, do some seemingly "irrational" things, or choose a completely rational approach.
This is something I've consistently observed over my forty-year career as a programmer: people are constantly changing their tools. It's always in flux.
Host:
Have you thought about how these agents might differ across various dimensions?
Kevin Scott:
I think the most critical difference may lie in the mindset of product designers.
Now, some of the most interesting startups I see are not innovating by creating a whole new infrastructure; their innovation lies in their deeper understanding of a user problem than anyone else.
Then, based on existing infrastructure, they can make slight adjustments to solve that problem at a world-class level. This approach is what we truly need right now.
Kevin Scott:
This will also drive the formation of diversity in agents—what agents are used to solve what problems will ultimately be driven by this dimension.
And to be honest, it's now easier to form this kind of "detailed understanding" of user problems and to pick up various tools to try to solve these issues.
So we will see a large number of companies and teams creating various things to try to meet different needs.
Kevin Scott:
Even in the field of "software development tools," things have already started to get crazy—there are simply too many tools that have emerged in the past year.
And these tools are quite interesting, each with its own characteristics.
For software tool development companies like us, this indeed puts a lot of pressure on us because we have to deal with so much innovation and change.
But from a technical perspective, this is really fascinating.
We find that: as long as you have a certain nuanced understanding of user needs, there will always be someone willing to try your solution, especially those users with high tolerance and high interest.
Host:
Yes. Our time is running out, but I have one more question.
Suppose we sit down again at the Build conference a year from now, do you think: some of the hot topics or major issues now will become less important a year later? And what will become the real focus of discussion a year later? What are your predictions?
Kevin Scott:
I think those who still insist that "this technology is not ready yet"—for example, saying: "I tried it, but it's a bit expensive" or "it's still lacking a little in functionality"—if they use these as excuses for inaction, they will soon be left far behind.
Because these issues will become trivial over time: technology will become cheaper and more powerful every year.
Kevin Scott:
I think by 2025, this viewpoint won't even need to be "lobbied." In the past, many people loudly claimed: "Technological progress will soon stagnate, and everyone will be disappointed."
Although there are still people saying this now, I feel that not many are seriously listening to them anymore. After all, what can you gain by listening to these "pessimists"? You are betting on failure, and the cost difference between "betting on failure" and "betting on optimism" is actually very large.
Kevin Scott:
I believe that moving forward, the problems people will solve using agents will become increasingly complex and ambitious. At the same time, the "agent network" will become more complete, and connections will become more robust; The reasoning and planning capabilities of models will also become stronger. This will prompt us to move from the current "synchronous interaction" model to a more robust "asynchronous interaction" era.
Kevin Scott:
The current mode of interaction is: you sit down, think about what you want to accomplish, then give instructions to the agent, wait for it to return a result, and then you operate based on that result.
But by next year, you might see a usage like this: "Hey, go help me get this done."
Then the agent will take time to process: it will call many external systems, it will integrate information, it will iterate repeatedly, it will continuously process, summarize, and advance, and finally, after a non-instant but in-depth period, the agent will tell you: "I've helped you advance to this point, now it's your turn."
Host:
It really sounds like the future I want to live in.
Kevin Scott:
I think so too, sincerely.
Host:
Well, Kevin, thank you very much for joining the show today. It was truly an amazing conversation.
Kevin Scott:
I'm glad to have this dialogue with you, and I really enjoyed it. Thank you for inviting me