HBM, double the challenge

High Bandwidth Memory (HBM), as the next generation DRAM technology, has become a key component in training large AI models due to its unique 3D stacking structure and characteristics of high bandwidth and low latency. SK Hynix has performed strongly in the HBM market, with a significant increase in market share, surpassing Samsung Electronics in the second quarter of 2024 to become the top global memory sales company. HBM3E products are favored by tech giants such as AMD and NVIDIA for their high performance and low power consumption, and SK Hynix is the only manufacturer producing HBM3E at scale

High Bandwidth Memory (HBM), as the next-generation Dynamic Random Access Memory (DRAM) technology, is characterized by its unique 3D stacking structure—vertically stacking multiple DRAM chips (typically 4, 8, or even 12 layers) through advanced packaging technology. This structure allows HBM's bandwidth (data transfer rate) to far exceed that of traditional memory solutions like GDDR.

With its high bandwidth and low latency characteristics, HBM has become a key component in the training and inference of large AI models. In AI chips, it plays the role of "L4 cache," significantly enhancing data read and write efficiency, effectively alleviating memory bandwidth bottlenecks, and greatly improving the computational power of AI models.

HBM Market, SK Hynix Dominates

Leveraging its leading advantage in HBM technology, SK Hynix's position in the industry continues to rise. Market data shows that starting from the second quarter of 2024, both Micron and SK Hynix's DRAM market shares have been steadily increasing, while Samsung's has gradually declined; in the HBM sector, the previously balanced competition with Samsung has been disrupted, with the share gap between the two expanding to more than double by the first quarter of this year.

More notably, in the second quarter of this year, SK Hynix surpassed Samsung Electronics for the first time with approximately 21.8 trillion Korean won in DRAM and NAND sales, topping the global storage sales chart (Samsung Electronics at approximately 21.2 trillion Korean won). This breakthrough is largely attributed to the strong performance of its HBM products—although SK Hynix did not initially stand out in the HBM market as a major exclusive supplier to NVIDIA, the surge in demand for its high-performance, high-efficiency products has been significant with the rise of global AI development.

Among them, the fifth-generation High Bandwidth Memory HBM3E is a key driver. This product features high bandwidth and low power consumption, widely used in AI servers, GPUs, and other high-performance computing fields, attracting major tech giants like AMD, NVIDIA, Microsoft, and Amazon to compete for procurement between 2023 and 2024. SK Hynix is the only manufacturer in the world capable of mass-producing HBM3E, and its 8-layer and 12-layer HBM3E production capacity for 2025 has already been sold out.

In contrast, Samsung Electronics missed the opportunity due to delays in deliveries to NVIDIA, especially in the HBM3E sector, which has the widest application in the AI market. Its market share plummeted from 41% in the second quarter of last year to 17% in the second quarter of this year, with reports indicating that it failed to pass NVIDIA's third HBM3E certification.

Looking ahead, Everbright Securities expects that the demand for the HBM market will continue to grow, driving the development of the storage industry chain; Citigroup predicts that SK Hynix will continue to dominate the HBM market. Clearly, SK Hynix is poised to become a "memory dinosaur" in the AI era

Storage Manufacturers Develop HBM Alternatives

In the face of SK Hynix's strong performance, other manufacturers in the industry are accelerating technological innovation and exploring alternatives to HBM.

Samsung Restarts Z-NAND

After a seven-year hiatus, Samsung Electronics has decided to restart its Z-NAND memory technology, positioning it as a high-performance solution to meet the growing demands of artificial intelligence (AI) workloads. This announcement was officially made at the 2025 Future Memory and Storage (FMS) Forum in the United States, marking Samsung's re-entry into the high-end enterprise storage sector.

Hwaseok Oh, Executive Vice President of Samsung's Memory Business, stated at the event that the company is fully committed to redeveloping Z-NAND, aiming to enhance its performance to 15 times that of traditional NAND flash memory while reducing power consumption by up to 80%. The upcoming new generation of Z-NAND will feature GPU-initiated Direct Storage Access (GIDS) technology, allowing GPUs to directly access data from storage without going through the CPU or DRAM. This architecture is designed to minimize latency and accelerate the training and inference processes of large AI models.

The revival of Z-NAND reflects a broad transformation occurring in the industry—rapidly expanding AI models have gradually exceeded the capacity of traditional storage infrastructures. In current systems, data must be transferred from SSDs through the CPU to DRAM and then to the GPU, creating significant bottlenecks that lead to performance degradation and increased energy consumption. Samsung's GIDS-supported architecture can eliminate these bottlenecks, allowing GPUs to load large datasets directly into VRAM from storage. Oh pointed out that this direct integration can significantly shorten the training cycles of large language models (LLMs) and other compute-intensive AI applications.

In fact, Samsung first introduced Z-NAND technology back in 2018 and released the SZ985 Z-SSD aimed at enterprise-level and high-performance computing (HPC) applications. This 800GB solid-state drive is based on 48-layer V-NAND and an ultra-low latency controller, achieving sequential read speeds of up to 3200MB/s, random read performance of 750K IOPS, write speeds of 170K IOPS, and latencies below 20 microseconds, with performance exceeding existing SSDs by more than five times, and read speeds being ten times faster than traditional 3-bit V-NAND. Additionally, the SZ985 is equipped with 1.5GB of energy-efficient LPDDR4 DRAM, with a rated write capacity of up to 42PB (equivalent to storing 8.4 million full HD movies), and ensures reliability with an average time between failures (MTBF) of 2 million hours.

X-HBM Architecture Makes a Grand Entrance

NEO Semiconductor has launched the world's first ultra-high bandwidth memory (X-HBM) architecture suitable for AI chips. This architecture is based on its self-developed 3D X-DRAM technology, successfully breaking through the inherent bottlenecks of traditional HBM in bandwidth and capacity, and its release may lead the memory industry into a new phase of "super memory" in the AI era In contrast, the HBM5, which is still in the development stage and is expected to be launched around 2030, only supports a 4K-bit data bus and a capacity of 40Gbit per chip. The latest research from the Korea Advanced Institute of Science and Technology (KAIST) predicts that even the HBM8, expected to be released around 2040, will only achieve a 16K-bit bus and a capacity of 80Gbit per chip.

However, X-HBM, with its 32K-bit bus and a capacity of 512Gbit per chip, allows AI chip designers to bypass the performance bottlenecks that traditional HBM technology would take ten years to gradually overcome. It is reported that X-HBM has a bandwidth that is 16 times that of existing memory technologies and a density that is 10 times that of current technologies — its 32Kbit data bus and maximum 512Gbit storage capacity per chip provide exceptional performance that significantly breaks the limitations of traditional HBM, precisely meeting the growing demands of generative AI and high-performance computing.

Saimemory develops stacked DRAM

Saimemory, co-founded by SoftBank, Intel, and the University of Tokyo, is developing a new stacked DRAM architecture aimed at becoming a direct alternative to HBM, and even surpassing its performance.

The technology path of this new company focuses on optimizing 3D stacking architecture: by vertically stacking multiple DRAM chips and improving inter-chip interconnection technology (such as using Intel's Embedded Multi-Die Interconnect Bridge technology EMIB), it aims to increase storage capacity while reducing data transmission power consumption. According to the plan, its target products will achieve at least double the capacity of traditional DRAM, reduce power consumption by 40%-50% compared to HBM, and have significantly lower costs than existing HBM solutions.

This technological route differentiates itself from companies like Samsung and NEO Semiconductor, which focus on capacity enhancement with the goal of achieving a single module capacity of 512GB; Saimemory, on the other hand, is more focused on addressing the power consumption pain points of AI data centers, aligning with the current industry trend of green computing.

In terms of technical collaboration, Intel provides advanced packaging technology expertise, Japanese academic institutions like the University of Tokyo contribute storage architecture patents, and SoftBank has become the largest shareholder with an investment of 3 billion yen. The initial 15 billion yen in R&D funding will be used to complete prototype design and mass production evaluation before 2027, with plans for commercialization by 2030.

SanDisk partners with SK Hynix to promote HBF high-bandwidth flash memory

SanDisk and SK Hynix recently announced the signing of a memorandum of understanding, in which both parties will jointly develop specifications for High Bandwidth Flash (HBF) memory. This collaboration stems from SanDisk's introduction of the HBF concept earlier this year in February — a new storage architecture designed specifically for the AI field, which integrates the technological characteristics of 3D NAND flash memory and high-bandwidth memory (HBM). According to the plan, SanDisk will launch the first batch of HBF memory samples in the second half of 2026, with AI inference device samples using this technology expected to hit the market in early 2027 As a memory technology based on NAND flash memory, HBF innovatively adopts a HBM-like packaging form, which can significantly increase storage capacity and reduce costs compared to the expensive traditional HBM, while also possessing the non-volatile advantage of data retention during power loss. This breakthrough marks the first time in the industry that the storage characteristics of flash memory have been integrated with the high bandwidth performance of DRAM-like memory into a single stack, promising to reshape the model of large-scale data access and processing for AI models.

Compared to traditional HBM that relies entirely on DRAM, HBF replaces part of the memory stack with NAND flash memory, allowing for an increase in capacity to 8-16 times that of DRAM-type HBM, while moderately sacrificing original latency, all on a cost and bandwidth level close to that of DRAM-type HBM. Additionally, unlike DRAM, which requires continuous power supply to retain data, the non-volatility of NAND allows HBF to achieve persistent storage with lower energy consumption.

Multidimensional architectural innovations reduce reliance on HBM

In addition to continuous innovation in storage technology, manufacturers are also actively exploring architectural innovations in the AI field to reduce dependence on HBM.

Processing-In-Memory Architecture

In the 1940s, with the birth of the first computer in modern history, the von Neumann architecture based on the principle of "separation of storage and computation" emerged, and subsequent chip designs have largely followed this architecture. Over nearly 70 years of development in the modern chip industry, technological advancements have mainly focused on optimizing software and hardware design, while the underlying architecture of computers has not fundamentally changed.

Processing-In-Memory (PIM) or Compute-in-Memory (CIM) is an innovative architecture proposed in this context. Its core idea is to integrate computing functions within the memory itself or in close proximity, thereby avoiding the inherent bottleneck of "computation - storage - data movement" in traditional architectures. By directly deploying computation units within the storage cells, the physical distance for data transmission is shortened, allowing the PIM architecture to integrate computation and storage units, optimize data transmission paths, and break through the computational ceiling of traditional chips. This not only shortens system response times but also achieves an order-of-magnitude improvement in energy efficiency. Once the technology matures, it is expected to reduce the dependence on high-bandwidth memory by an order of magnitude, partially replacing the functions of HBM.

Huawei's Breakthrough AI Technological Achievements

Huawei recently released the UCM (Unified Cache Memory), an inference acceleration suite centered around KV Cache (Key-Value Cache). It integrates various caching acceleration algorithm tools to manage the KV Cache memory data generated during the inference process hierarchically, effectively expanding the inference context window, thereby achieving high throughput and low latency inference experiences, and reducing the inference cost per Token. Through this innovative architectural design, UCM can reduce dependence on high-bandwidth memory (HBM) while significantly enhancing the inference performance of domestic large models

The Future Will Be an Era of Multi-Level Architecture

Whether in training or inference scenarios, computing power and storage are the first areas to benefit, and these two will become key factors determining the competitive landscape of AI in the next decade.

Similar to GPGPU products, the demand for HBM (especially HBM3 and above) is strong and has long been monopolized by foreign manufacturers. At the beginning of 2025, the spot price of HBM3 chips surged by 300% compared to early 2024, while the DRAM usage of a single AI server reached 8 times that of traditional servers. From the market perspective, overseas manufacturers still dominate: SK Hynix leads with a 53% share and has achieved mass production of HBM3E first; Samsung Electronics holds a 38% share and plans to double its HBM supply by 2025 compared to last year; Micron Technology currently has a 10% share, aiming to increase its market share to over 20% by 2025.

Although HBM has established a foothold in high-end AI applications due to its excellent performance, it may face competitive pressure from emerging technologies as other memory technologies continue to make breakthroughs in cost control, performance improvement, and power consumption optimization. However, in the short term, HBM remains the preferred solution for high bandwidth demand scenarios.

From a long-term development trend, the market will continuously adjust and optimize with technological evolution and changes in application demands. The future AI memory market is not simply a "replacement and being replaced" relationship; the innovation of HBM alternatives presents a "diversity of architectural philosophies" rather than a single technological iteration. It is foreseeable that there will not be a "single winner" that completely replaces HBM in the field of AI computing and memory; instead, there will be a more complex, decentralized, and scenario-specific memory hierarchy structure — the era of a single memory solution dominating high-performance computing is coming to an end.

The future AI memory landscape will be a heterogeneous and diverse hierarchical system: HBM will focus on training scenarios, PIM memory will serve high-efficiency inference, dedicated on-chip memory architectures will adapt to ultra-low latency applications, and new technologies such as stacked DRAM and photonic interconnects will also have a place in the system. Various technologies will achieve precise optimization for specific workloads, collectively forming the memory ecosystem of the AI era.

Author: Peng Cheng, Source: Semiconductor Industry Review, Original Title: "HBM, Challenges Double" Risk Warning and Disclaimer

The market has risks, and investment should be cautious. This article does not constitute personal investment advice and does not take into account the specific investment goals, financial situation, or needs of individual users. Users should consider whether any opinions, views, or conclusions in this article are suitable for their specific circumstances. Investment based on this is at one's own risk