Track Hyper | AMD AI PC Chip Zen 5 Architecture Reveals True Colors

The Ryzen AI 300 series laptops will be launched on July 28th, what makes the Zen 5 architecture (CPU) stand out?

Author: Zhou Yuan / Wall Street News

On July 28th, the Ryzen AI 300 laptop was launched.

This model has attracted attention because it is equipped with AMD's new AI processor "Ryzen AI 300 series" (4nm process technology).

The processor in this series adopts AMD's new Zen 5 architecture. AMD had previously unveiled the Ryzen AI 300 series and Ryzen 9000 series desktop AI processors at the Computex Taipei international computer show in early June this year, providing a brief introduction to the Zen 5 architecture.

Prior to the launch of laptops equipped with the Ryzen AI 300 series desktop processor on July 28th, AMD revealed detailed technical information about this processor at the Zen 5 Tech Day event held in the United States.

The customized version of the Zen 5 architecture is called Zen 5c, featuring a more "compact" core that is approximately 25% smaller than the standard full-featured Zen 5 architecture core. These two core types, on the same chip, have different amounts of cache. This is the first time AMD has made such a design.

AMD introduced the Zen architecture in 2017, replacing the previous Bulldozer architecture: the IPC (instructions per cycle) performance of the Zen architecture has increased by 52% compared to the Bulldozer architecture, far exceeding the expected 40% performance improvement.

Since the introduction of the Zen architecture seven years ago, AMD has iterated it five times. AMD claims that the newly introduced Zen 5 architecture has improved IPC performance by 16% compared to the previous Zen 4 architecture, achieving a "substantial leap in performance."

What efforts has AMD made?

In simple terms, AMD has made multiple architectural improvements, including increasing the number of instructions per clock cycle, expanding instruction dispatch and execution bandwidth, doubling cache data bandwidth, and AI acceleration, among others. For example, by expanding pipelines and vector sizes, AMD enhances the throughput of the Zen 5 architecture, which helps to process more data simultaneously and improve core parallel processing capabilities.

Since the birth of CPUs, a mature system has been formed at the design level. Roughly, CPU design is divided into two stages: front-end and back-end, with physical module structures including different units such as instruction prefetch and decode, integer execution, floating-point execution, load-store, and cache.

The Zen 5 architecture has increased the specifications of some modules in the front-end, such as from 1 to 1.5: AMD has designed dual prefetch, dual decode pipelines, and doubled instruction bandwidth, among others (front-end).

The doubling of front-end bandwidth instructions helps to improve the efficiency of processors in handling complex calculations and data-intensive tasks. This new design is also reflected in the data transfer rate between L1 and L2 caches, as well as the increased data transfer rate from L1 cache to floating-point (FP) units.

In the floating-point and vector execution unit part of the Zen 5 architecture, building on the AVX-512 instruction set introduced in the Zen 4 architecture, support has been strengthened from only 256-bit data width to full 512-bit support; the L1 cache capacity has also increased from 32KB to 48KB, while the ways have increased from 8 to 12 Faced with the surging demand for AI computing power and applications, the Zen 5 architecture has significantly improved the performance of mathematical acceleration units: the speed of single-core mathematical learning has increased by up to 32%, and AES-XTS instruction speed has increased by up to 35%.

Thanks to these technological advancements, the IPC performance of the Zen 5 architecture can be improved by 16%, and in some scenarios, it can even be increased by 35%.

AMD claims that the overall performance improvement of the Zen 5 architecture compared to its predecessor is "huge," is this an exaggeration?

Wall Street Journal reviewed the performance improvement of the Zen architecture in the previous four iterations and found that this statement is somewhat exaggerated. The performance improvements of Zen+, Zen 2, Zen 3, and Zen 4 compared to their predecessors were 3%, 15%, 19%, and 13% respectively. This time it is 16%, which is not as high as Zen 3's performance increase.

However, this does not affect the technological innovation of the Zen 5 architecture and the AI performance it brings. This is a desktop processor that meets AI requirements, while processors using the Zen 3 architecture do not have these new technological advantages.

The laptops equipped with the Ryzen AI 300 series processors, which were launched on July 28th, come in two specifications: Zen 5 and Zen 5c.

These two processors use the same CPU architecture, so what are the differences? Zen 5c is a customized version, and relatively, due to its "compact" design, the cache of the Zen 5c architecture is smaller and the frequency is lower compared to the Zen 5 architecture, but it is more energy-efficient and better suited for mobile scenarios.

As an AI PC processor, the Ryzen AI 300 series also adopts heterogeneous design, which includes CPU+GPU+NPU. The CPU architecture has been upgraded to Zen 5, the GPU uses the RDNA 3.5 architecture, and the NPU uses the XDNA 2 architecture. The RDNA 3.5 architecture mainly improves energy efficiency, memory performance, and battery life.

In the NPU section, the XDNA 2 architecture has increased the number of AI engine units from 20 to 32 (in a 4x8 configuration), doubled the MAC quantity in each unit, increased the onboard memory capacity by 1.6 times, supports Block FP16 block floating-point format, nonlinear enhancement support, and 8 concurrent spatial flows (doubling compared to the previous generation architecture), resulting in the NPU computing power of the Ryzen AI 300 series reaching up to 50 TOPS.

This indicator surpasses leading similar competitors globally: Intel Lunar Lake (48 TOPS), Qualcomm Snapdragon X Elite (45 TOPS), and Apple M4 (38 TOPS)