Zhipu AutoGLM launched: Install a universal agent on every phone

Wallstreetcn
2025.08.20 08:20
portai
I'm PortAI, I can summarize articles.

Zhipu released AutoGLM 2.0, the world's first mobile agent, equipped with reasoning, coding, and multimodal capabilities, capable of executing diverse tasks on any device. Users can give simple instructions to let AutoGLM operate applications like Meituan and JD.com to complete tasks such as food delivery and flight bookings. It is not just a chat tool but a versatile agent that can perform work across websites in office scenarios, generate content, and publish it to social media

Do you remember "the first red envelope sent by AI to humans"?

In October last year, Zhipu released the world's first Phone-Use product AutoGLM, ushering in a new era of Agents.

Today, AutoGLM 2.0 has been upgraded again, elevating Agent applications to a new height—

  • The world's first mobile Agent, available for everyone;
  • Creating a new technological paradigm of Agent + cloud phone/cloud computer, without occupying users' phones and computers;
  • Breaking hardware limitations, running on any device and in any scenario, helping users perform operations;
  • Driven by domestic models (GLM-4.5, GLM-4.5V), with comprehensive capabilities in reasoning, coding, and multimodality.

From now on, everyone can use AutoGLM. We will quickly iterate and launch new features (the "scheduled tasks" feature will be online soon, with AI actively working for you every day). Search for "AutoGLM" in the app store or click "Read the original text" at the end of the article.

Operational Execution Assistant

Past AIs mostly stayed at the "dialogue" level; general intelligent assistants were also limited to information queries and summaries.

AutoGLM 2.0 has achieved a qualitative leap—it no longer just "talks," but can truly "do."

In fact, in AutoGLM 1.0, we explored letting AI replace users in completing some mobile operations, but it only worked in limited scenarios. With the release of AutoGLM 2.0, it has grown into an executive assistant capable of autonomously completing diverse tasks in the "cloud."

In everyday scenarios, users only need to say a sentence to let AutoGLM operate dozens of high-frequency applications like Meituan, JD.com, Xiaohongshu, Douyin: ordering takeout, booking flights, checking housing sources, for example, helping you buy "the first cup of milk tea in autumn."

In office scenarios, it can also execute full-process work across websites, operating web versions of Feishu, NetEase Mail, Zhihu, Weibo, Douyin, Weitoutiao, etc.: from information retrieval to content writing, to generating videos, PPTs, or podcasts, and directly completing content publishing on social media platforms like Xiaohongshu and Douyin.

This means that AI is no longer just a "chat tool," but a versatile agent that can truly work for you. It can not only provide answers but also fully execute tasks, helping users save time and energy, fundamentally changing the way humans collaborate with AI.

Equipping AI with a Phone

The main highlight of AutoGLM is an app that turns a phone into a true "new species."

In AutoGLM 2.0, we have equipped AI with a dedicated intelligent agent phone/intelligent agent computer, allowing it to autonomously work and complete tasks in the cloud without occupying the user's local device, during which the user can use other apps (like browsing Douyin, playing games).

This means that AI can not only "autonomously drive the phone" but also "asynchronously act as an office agent." Turning the phone into an intelligent agent phone with autonomous execution and cross-end collaboration capabilities AutoGLM will appear in this product form, stemming from our understanding of the early form of AGI. We believe that to transition from Agent to AGI, the 3A principles must be met:

  • Around-the-clock: Operating 24 hours a day, even when the user is offline, the Agent continues to execute tasks;
  • Autonomy without interference: Operating independently, without occupying the user's screen or computing power, a companion in a parallel world;
  • Affinity: Breaking out of the browser dialog box, crossing devices such as smartphones, computers, smartwatches, glasses, and home appliances to operate in the physical world.

New AI Hardware

With the powerful cloud execution capability of AutoGLM, the way people interact with devices is being redefined.

We have encapsulated the operational execution capability of AutoGLM as an API, allowing developers to easily integrate this capability seamlessly into various hardware devices, from AI glasses and other wearable devices to traditional home appliances.

AutoGLM enables hardware to possess complete mobile-level operational capabilities for the first time, without the need to stack complex systems or large-capacity batteries on the client side. For example, you can order a cup of coffee through smart glasses.

Starting today, the application channel for AutoGLM mobile API and the developer ecosystem co-construction plan is officially launched. In addition to smartphones and computers, devices such as smartwatches, glasses, and home appliances can all become Agent-driven intelligent assistants.

We look forward to exploring the infinite possibilities of AI integrating into the physical world with more developers.

Technical SOTA

AutoGLM can be made available for free to everyone in the country because it is a purely domestic Agent, with costs significantly lower than those of Agents that connect to foreign models.

AutoGLM is powered by the latest open-source SOTA language model GLM-4.5 and the visual reasoning model GLM-4.5V. AutoGLM maximizes the native capabilities of the base model and combines multiple breakthroughs in "end-to-end asynchronous reinforcement learning," enabling it to perform various tasks such as reasoning, coding, research, Agentic, and GUI operations, and flexibly call the most suitable "brain" for execution based on needs.

  • ComputerRL: Proposed the API-GUI collaborative paradigm to enhance data diversity and computational efficiency; improved GRPO and introduced the Entropulse mechanism to enhance exploration and strategy diversity.
  • MobileRL: Innovated a difficulty-adaptive reinforcement learning method (reasoning bootstrap preheating + difficulty-adaptive GRPO), significantly improving the stability and convergence efficiency of mobile tasks.
  • AgentRL: Solved instability and uneven gradient distribution in multi-task training through cross-sampling and task advantage normalization mechanisms, enhancing overall robustness and efficiency In the Device Use benchmark test (covering mobile phones, computers, and web operations), AutoGLM outperformed ChatGPT Agent, UI-TARS-1.5, and Claude Sonnet 4, demonstrating stronger robustness and versatility, reaching the SOTA level of mainstream agents.

Risk Warning and Disclaimer

The market has risks, and investment requires caution. This article does not constitute personal investment advice and does not take into account the specific investment objectives, financial situation, or needs of individual users. Users should consider whether any opinions, views, or conclusions in this article are suitable for their specific circumstances. Investing based on this is at one's own risk