# Survey of Chiplet Technology: SoC Architecture, Interconnect, EDA, and Advanced Packaging

Hongwei Liu, Yuan Du<sup>®</sup>, Senior Member, IEEE, Bo Pu, Senior Member, IEEE, Guojun Yuan<sup>®</sup>, Yuhang Liu<sup>®</sup>, Linji Zheng, Pengchao Wang, An Yang, Yu Li, Chengming Yu, Fei Guo, Xiaoteng Zhao<sup>®</sup>, Member, IEEE, Xuqiang Zheng<sup>®</sup>, Member, IEEE, He Sun, Yongfu Li<sup>®</sup>, Senior Member, IEEE, Shaolin Xiang, and Qinfen Hao<sup>®</sup>, Senior Member, IEEE

Abstract—Chiplet technology has emerged as a transformative approach in integrated circuit design. Although it has attracted significant attention recently, there has been limited effort dedicated to clearly defining its concept, terminology, composition, and evolution phases etc. This survey paper gives a formal definition by proposing chiplet terminology and composition, name it as a new design methodology, then analyze over 200 recent publications from both academia and industry to establish chiplet as a technology domain composed of four distinct fields: chiplet-based SoC architecture, interconnect, EDA tools, and advanced packaging. For each field composing chiplets, the paper traces the technology development, analyze challenges, outline the evolution trend and challenges. This survey aims to provides an in-depth examination of chiplet domain and each field's progress, offering insights drawn from literature analysis to outline the current and emerging landscape of chiplet technology.

*Index Terms*—Chiplet, terminology, composition, chiplet-based SoC, chiplet interconnect, EDA for chiplets, advanced packaging for chiplets, 2.5-D system, 3-D system, chiplet marketplace.

#### I. BACKGROUND

CHIPLET technology marks a paradigm shift in integrated circuit design by introducing a modular approach to the semiconductor industry. It effectively addresses many challenges faced by traditional monolithic system-on-chip (SoC) designs. As the industry struggle with rising costs and physical limitations, chiplet has emerged as a promising solution to

Received 8 April 2025; revised 14 September 2025 and 1 November 2025; accepted 15 November 2025. Date of publication 24 November 2025; date of current version 15 December 2025. This work was supported by the National Key Research and Development Program of China under Grant 2022YFB4401501. This article was recommended by Guest Editor S. Yu. (Corresponding author: Qinfen Hao.)

Hongwei Liu, Guojun Yuan, Yuhang Liu, and Qinfen Hao are with the Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China (e-mail: haoqinfen@ict.ac.cn).

Yuan Du is with Nanjing University, Nanjing, Jiangsu 210093, China. Bo Pu is with DeTooLIC Technology Company Ltd., Ningbo, Zhejiang 315800, China.

Linji Zheng, Pengchao Wang, An Yang, Yu Li, Chengming Yu, and Fei Guo are with Wuxi Institute of Interconnect Technology, Wuxi, Jiangsu 214121,

Xiaoteng Zhao is with the Key Laboratory of Analog Integrated Circuits, School of Integrated Circuits, Xidian University, Xi'an, Shanxi 710126, China. Xuqiang Zheng and He Sun are with the Institute of Microelectronics,

Chinese Academy of Sciences, Beijing 100029, China. Yongfu Li is with the Department of Micro and Nano Electronics Engineering, Shanghai Jiao Tong University, Shanghai 200240, China.

Shaolin Xiang is with Tsinghua University, Beijing 100084, China. Digital Object Identifier 10.1109/JETCAS.2025.3636408

drive continued innovation and performance improvements beyond conventional scaling methods.

## A. Origin of Chiplet Technology and Its Development History

The concept of chiplet has evolved significantly over decades, progressing from theoretical propositions to commercial realities that now shape the semiconductor industry.

The term "chiplet" was first coined by the University of California in 2006 [1], though its conceptual roots extend much further back, tracing to Gordon Moore's paper published in 1965 [2]. Later, programs like DARPA's COSMOS [3] and CHIPS [4] initiatives aimed to create modular SoC system by developing integration standards, reusable IP blocks, and design tools for chiplets. A significant milestone marking the shift from research to commercial use occurred in 2016, when Marvell partnered with Kandou Bus to utilize Kandou's chipto-chip interconnect technology [5], enabling the connection of multiple chips.

Widespread industry adoption of chiplet technology soon followed, with major manufacturers integrating chiplet designs into their commercial products. For clarity, we divide chiplet development as two phases, "chiplet 1.0" and "chiplet 2.0," though these terms are not formally defined in existing literature. The first phase, "chiplet 1.0", was driven primarily by large semiconductor companies aiming to overcome manufacturing constraints, improve yields, and build larger, more complex chips when advances in process technology were slowing down. AMD emerged as an early leader during this phase, incorporating chiplet designs into its ZEN architecture and subsequent architectures. The focus during this stage was largely on advanced packaging techniques.

The second phase, which we call "chiplet 2.0," is distinguished by efforts to standardize chiplet interfaces and the development of a chiplet marketplace, making the technology accessible to a broader range of companies, including small and medium-sized players. This shift is transforming chiplets from a packaging solution into a comprehensive design methodology. It enables companies to concentrate on their core strengths while integrating pre-designed, tested third-party components. In chiplet 2.0 phase, investment is increasingly directed toward standardized interconnect IP and specialized EDA tools tailored for chiplet design, in addition

to advanced packaging. Important milestones in this evolution include the introduction of the Universal Chiplet Interconnect Express (UCIe) standard [6], the launch of the Open Compute Project's Open Chiplet Economy Marketplace [7], and various standardization initiatives in China [8]. Unlike chiplet 1.0 phase, this phase emphasizes standardization, interoperability, and building a collaborative ecosystem, marking a shift from company-specific solutions toward an industry-wide approach.

#### B. Advantages and Disadvantages of Chiplet Technology

Chiplets offer several key advantages over traditional monolithic designs:

- 1) Reusable Hard IP: The same chiplet can be used across multiple SoC application, boosting development efficiency and maximizing return on investment.
- 2) Heterogeneous Integration: Chiplets can be manufactured using different processes, materials, and technology nodes, each tailored to its specific function. This allows customer to select the most suitable technology for each component.
- 3) *Improved Yields:* By testing chiplets individually before assembly and reducing die size, manufacturers can achieve higher yields, lower costs and minimizing waste.
- 4) *Design Flexibility:* This modular approach makes it easier to quickly respond to market demands and upgrade particular components without redesigning the entire system.
- 5) Overcoming Physical Limitations: help to bypass physical constraints like reticle size, enabling greater performance and scalability in semiconductor devices.

Despite the advantages, chiplets present the following disadvantages that need to be take care in design:

- 1) Communication Overhead: Inter-chip latency is higher, bandwidth is more limited, and communication is less power-efficient than on-chip connections.
- 2) Packaging Complexity: Packaging costs increase due to higher I/O requirements and the need for fine-pitch wiring with multiple levels of vias.
- 3) Power Management Complexity: Longer times for power management functions due to clock domain crossing and complications if chiplets require different supply voltages.
- 4) Thermal Complexity: Thermal management becomes more difficult when multiple chiplets are packed closely together, requiring innovative cooling solutions.
- 5) Supply Chain Vulnerabilities: If a single chiplet is unavailable due to manufacturing issues or supply chain disruptions, the entire product can be delayed.

## C. Challenges in the Long Run

Despite its potential, chiplet technology still faces several significant challenges that could impact its long-term adoption:

- 1) Standardization Hurdles: The lack of universally adopted interconnect protocols limits interoperability between chiplets from different manufacturers, slowing the development of a truly open chiplet marketplace.
- 2) Packaging Challenges: Chiplet production relies on cutting-edge packaging processes that are expensive, laborintensive, and prone to errors, which potentially limits widespread adoption.

- 3) *Intellectual Property Issues:* Collaborative innovation, which is often required for effective chiplet integration, can lead to IP sharing and potential disputes over patent ownership and usage rights.
- 4) *EDA Design Flow Limitations:* Current tools struggle with multi-die timing closure, hierarchical power delivery, and 3-D floor-planning; seamless analysis across chip, package, and system levels remains an unmet challenge.

#### D. Motivation of Survey

Chiplet technology has garnered significant attention from both academia and industry, leading to the publication of numerous survey papers in recent years. However, most of these surveys tend to focus on specific technical aspects within individual fields. For example, Lau's survey [9] provides a thorough analysis of advanced packaging techniques, covering various but only packaging concepts such as 2.1-D, 2.3-D, 2.5-D, 3-D, and 3.5-D. Yu's survey [10] concentrates on chiplet-related EDA technologies, highlighting part of essential EDA tools for chiplet design. Das et al.'s survey [11] centers on interconnect technologies used for intra-chip and inter-chip communication in computing systems with no classification of all chiplet interconnect. Yang et al.'s survey [12] offers a comprehensive examination of AI workload characteristics and explores heterogeneous chiplet integration strategies to enhance performance and energy efficiency in AI applications. Furthermore, a comprehensive survey by Liu et al. formally defines the chiplet research domain as SoC architecture and interconnect [13].

However, chiplet technology extends beyond one or two field. While some reports equate chiplet solely with advanced packaging, this perspective misunderstands its true scope, which also proved by IEEE definition [14]. Advanced packaging certainly plays a crucial role in integrating multiple chiplets, but designing a chiplet-based system also demands strong support from specific EDA tools, interconnect technologies, and need to be carefully on SoC architecture design. Consequently, chiplet technology is a comprehensive domain that integrates relevant aspects from numerous fields, such as SoC architecture, interconnect, EDA tools, and advanced packaging. Limiting the discussion of chiplet-related technological advancements to just one filed would fail to adequately capture the overall progress within the entire chiplet domain.

### II. DEFINITION AND TERMINOLOGY

## A. Chiplet Terminology

In this paper, we propose a glossary of key terms related to chiplet to clarify concepts and prevent misunderstandings. This glossary defines and explains important terminology commonly used in chiplet technology development and ecosystems.

1) Chiplet and Chiplets: A chiplet is defined as a small, modular integrated circuit (IC) or silicon die that implements a well-defined subset of functionality. Unlike traditional monolithic chips, which integrate all functions on a single silicon die, chiplet method break the system into smaller, specialized blocks by using specific EDA tools, those blocks can

be independently designed, manufactured, and tested before being interconnected using advanced packaging and highspeed interconnect. The term chiplets refers to multiple chiplet units working together within an integrated SoC system.

- 2) Monolithic IC: This term refers to a traditional integrated circuit design where all components and functions are implemented on a single silicon die. Chiplet technology offers an alternative approach that addresses scaling and yield challenges inherent in monolithic designs.
- 3) Chiplet Interconnect: The chiplet interconnect is the circuit block that links a pair of chiplets, enabling communication between them. It includes both the physical connections and the communication protocols. Various interconnect technologies exist for different applications, such as parallel and serial interconnect, and optical signaling can also serve as an interconnect technology in chiplet-based SoCs.
- 4) Chiplet-based SoC: Also called a multi-chiplet module. Typically, this is a SoC formed by integrating multiple, independently manufactured chiplets within a single package.
- 5) 2.5-D IC: A 2.5-D IC is a chiplet-based SoC where multiple chiplets are placed side-by-side on an interposer, which acts as the interconnect medium. The interposer may be made of silicon, glass, or organic materials. A substrate beneath the interposer connects the SoC to the printed circuit board (PCB). In contrast, a 2-D IC integrates dies directly on a conventional substrate without using an interposer.
- 6) Interposer: An intermediate substrate that connects and integrates multiple chiplets within a package, providing high-density electrical links. An active interposer is an interposer that not only provides electrical connections between chiplets but also incorporates active electronic circuitry, such as buffers, power management units, drivers, or network circuits.
- 7) 3-D IC: A 3-D IC is a chiplet-based SoC based on 3-D integration, where multiple chiplets are vertically stacked and connected using through-silicon vias (TSVs) or other vertical interconnect technologies. This approach maximizes integration density but presents challenges in thermal management.
- 8) Multi-physics field co-simulation: The simultaneous simulation and analysis of multiple interacting physical phenomena within a chiplet-based SoC. These phenomena include electrical, thermal, mechanical, fluid dynamics, and electromagnetic effects. This integrated approach is essential for ensuring the reliability, performance, and functionality of chiplets and their multi-chiplet integration. It forms a critical part of EDA workflows for chiplets.
- 9) Chiplet Marketplace/Library: A conceptual ecosystem where vendors offer standardized chiplets that system designers can purchase and integrate. This "Lego-like" model aims to democratize access to specialized silicon components and accelerate system development. It needs some tools to support its operation, such as EDA tools.
- 10) Chiplet 1.0: An early developmental phase marked by large semiconductor companies focusing on improving yields and creating larger, more complex chips as process scaling slowed. During this phase, advanced packaging technologies received increased investment, and SoC vendors designed chips based on proprietary, non-standardized interconnect technologies.

11) Chiplet 2.0: A more mature phase defined by industry-wide efforts toward standardization and the emergence of a chiplet marketplace or library that enables wider adoption [15], especially among medium and smaller companies. This phase emphasizes standardizing chiplet interconnects, EDA tools, test IP, and the growth of chiplet marketplaces or libraries. SoC vendors increasingly leverage chiplets IP from 3<sup>rd</sup> parties in chiplet library to develop their products.

## B. 2.5-D/3-D IC: A Chiplet Definition From Packaging Perspective

Notably, advanced integration such as 2.5-D and 3-D packaging played very important role in chiplet technology evolution. Key examples include TSMC's chip on wafer on substrate (CoWoS) [16] technology, which offers a wafer-level system integration platform that brings together multiple functional dies on a silicon interposer. Similarly, AMD's 3-D V-Cache [17] technology, which stacks a cache die directly atop a processor die, showcases the 3-D integration based on hybrid bonding.

The implementation and adoption of 2.5-D and 3-D packaging technologies represents a pivotal step in chiplet technology progression, moving beyond simple disaggregation to achieve high density integration with exceptional performance.

This step of chiplet development through advanced 2.5-D and 3-D packaging exhibits several distinctive features:

- 1) Extreme Performance Density. The advanced interposer and stacking technologies deliver unprecedented interconnect density and bandwidth between chiplets. TSMC's CoWoS platform features multiple copper layers for routing with a minimum pitch of  $4\mu m$ , while AMD's 3-D V-Cache technology delivers  $200\times$  greater interconnect density than traditional 2-D approaches.
- 2) Architectural Flexibility. This step introduces more flexibility in chiplet implementation. For instance, AMD's  $7950 \times 3D$  CPU selectively implements V-Cache on only one of two core chiplets, allowing a single processor to offer both high-cache and high-frequency optimized cores in the same package. This selective enhancement approach reflects a sophisticated understanding of workload characteristics.
- 3) Thermal and Power Complexity. These advanced packaging approaches incorporate sophisticated thermal management techniques. For example, 2.5-D implementations place dies side by side rather than stacking them, which helps reduce heat buildup compared to pure 3-D stacking. Meanwhile, AMD's implementation of V-Cache required careful design to maintain thermal integrity.

## C. Chiplet Technology Composition

In this paper, a definition to chiplet technology composition is proposed according to analysis to literatures, which include 4 technology fields: SoC architecture, chiplet interconnect, specific EDA tools for chiplet, and advanced packaging. for those 4 fields, they are also part of other technology domains, which are SoC, interconnect, EDA, and Packaging shown in Fig. 1. Reasons why these four fields are chosen are given as following.



Fig. 1. Chiplet technology composition.

Firstly, exploring SoC architecture involves addressing several key challenges when moving from traditional monolithic designs to modular chiplet-based systems. These challenges include: (1) partitioning monolithic system-on-chips (SoCs) into functional chiplets homogenously or heterogeneously, (2) designing optimized network-on-chip (NoC) architectures tailored for inter-chiplet communication, and (3) creating domain-specific chiplets that excel at specialized tasks.

Secondly, the chiplet interconnect is another crucial element in chiplet-based SoC design, as it directly impacts performance, power efficiency, and overall system integration complexity. When choosing the right interconnect solution such as parallel or serial even optical, designers need to balance bandwidth demands, latency limits, power consumption, and practical implementation feasibility. Since different application prioritize these factors differently, selecting an interconnect is highly context specific.

Thirdly, to make the design more efficient, Chiplet-based SoC demand more specific EDA design flows that cover architecture design exploration, partitioning implementation, worse SI/PI problems than monolithic design, and multiphysics co-simulation for multi-chiplet integration. These tools are integrated as comprehensive workflows to address the unique challenges of chiplet-based SoCs, such as inter-chiplet communication, complex power distribution, higher thermal management, and restrict mechanical constraints such as warpage.

Finally, advanced packaging become popular is because chiplet application. Several advanced packaging technologies facilitate chiplet integration, such as interposer-based 2.5-D integration, 3-D stacking, and wafer-level integration, etc. In fact, these packaging concepts are regarded as critical enabling factors for the design of chiplet-based SoC systems.

## D. Chiplet Library Is a New Method for Chip Development

Some experts envision the creation of LEGO-style chiplet library [18], where companies can easily source standardized functional blocks for integration, potentially democratizing access to high-performance design capabilities. Although this is ideal for SoC development, but given so many components needed, create a chiplet library which can act as a SoC development platform at the same time is more realistic.

Fig. 2 is an example of chiplet library, enabled through specific EDA tools and comprehensive design workflows. The



Fig. 2. Developing procedure around chiplet library.

design workflows can support developing from scratch and integrating existing chiplets. This aligns with current industry trends where companies are transitioning from simple chiplet aggregation to more sophisticated, ground-up chiplet designs.

The chiplet library emphasizes the critical role of EDA tools in supporting SoC development. For integration from existing chiplet mode, an integrated EDA design flow covers all steps such as substrate partitioning, modeling, layout, routing, simulation, cost estimation, and final tape-out, except architecture design exploration, which belong to developing from scratch mode, is a typical EDA tool for chiplet 2.0 phase. For both modes, multi-physics coupling simulation tools will be used to make the simulation deeper than monolithic design, indicating the need for advanced simulation capabilities in chiplet design. This reflects the industry's growing recognition that traditional EDA tools require significant enhancements to handle the complexity of multi-chiplet integration.

Key technical elements in a chiplet library also include the integration of various IP components: interconnect IP, and design for test (DfT) IP, etc. In such library, two critical areas need to be standardized: chiplet interconnect, different models for EDA tools and file exchange formats.

Furthermore, the security of chiplet library need to be considering in priority, because it has collected many IPs from suppliers, privacy preserve computing technology should be used to protect IPs. Moreover, to successfully operate the chiplet library, a feasible business model may need to be developed to benefit all involved parties, which are library operator, IP supplier, EDA vendors, and customer.

## III. SURVEY SCOPE AND STRUCTURE

## A. Scope of Survey

We chose over 200 recent papers from academic conferences and journals focused on chiplet technology to capture the current state of this domain. This survey offers an in-depth analysis of chiplet technology across four key fields: SoC architecture, interconnects, EDA, and advanced packaging techniques. To the best of our knowledge, this is the first survey to comprehensively cover literature spanning these four fields in both industry and academia.

Due to length constraints of paper and the goal of keeping this survey up to date, we selected papers from the past five years across the four key fields, focusing on their most significant innovations. This approach allows us to include



Fig. 3. Evolution of AMD CPU architecture.

more papers in a single survey and provide stronger evidence to support the technology trends we have identified.

## B. Structure of This Survey

The remainder of this paper is organized as follows. Sections IV, V and VI cover SoC architecture, chiplet interconnect, and advanced packaging, respectively. Although EDA plays a important role in all these areas, discussing it within each section could fragment the survey. To maintain integrity of each field, we dedicate a separate section (Section VII) to EDA, positioned at the end of the paper.

## IV. SOC ARCHITECTURE BASED ON CHIPLET APPROACH AND DOMAIN-SPECIFIC CHIPLETS

As chiplet technology rapidly advances, SoC system architectures are evolving from homogeneous to heterogeneous integration. Meanwhile, NoC is transitioning from planar layouts to more flexible, scalable hierarchical network designs. This also accelerate emerging of domain specific chiplet.

## A. Chiplet-Based SoC Architecture Is Evolving From Homogeneous to Heterogeneous Integration

The architectural design of chiplet-based SoCs includes two implementation paths: homogeneous and heterogeneous. Homogeneous design uniformly partitions functional units and keeps each chiplet with identical multi-functions. In contrast, a heterogeneous design employs differentiated function mapping, allocating diverse function to distinct chiplet, each with relatively independent functions.

Several factors are driving the shift in SoC architecture from homogeneous to heterogeneous design. First, real-word application workloads are highly diverse—some are computationally intensive, while others demand heavy I/O processing [12]. A heterogeneous design allows the SoC to adapt more precisely to varying needs; for example, computationally intensive applications can be addressed by increasing the number of computing chiplets. Second, using specialized chiplets in a heterogeneous design enhances chiplet reuse opportunities and helps to reduce the total cost of ownership (TCO) for IC development [19], [20]. Third, heterogeneous integration enables each functional module to independently select the optimal process technology—for example, advanced nodes for computing chiplets and mature processes for I/O chiplets—resulting in better manufacturing cost control and improved yields [21].

Fig. 3 illustrates this trend through the development of AMD processors. The first-generation chiplet-based processor



Fig. 4. (a) HiSilicon chip architecture (b) AMD processor architecture. (Redrawn from [B. Cohen], *IEEE Hot Chips Symposium (HCS)*, 2024. Redrawn from [J. Xia], *IEEE Micro*, 2021).

employed a homogeneous integration by connecting four identical chiplets via Infinity Fabric [22]. The second-generation chiplet-based processor introduced a heterogeneous design by dividing the SoC into separate computing and I/O chiplets [23], [24]. The third, fourth, and fifth-generation chiplet-based processor continue this trend by integrating computing dies, I/O dies, and cache chiplets heterogeneously, each fabricated using different process technologies [25], [26]. Intel CPU design shows similar trend [27].

Fig. 4(a) shows HiSilicon's chip architecture, which uses chiplet approach by combining different numbers of compute chiplets with a single I/O chiplet to offer a product line optimized for various application needs [28]. In Fig. 4(b), the AMD Ryzen 9 processor [25] consists of two compute chiplets and one I/O chiplet, while the Ryzen 7 and Ryzen 5 processor each have one compute chiplet and one I/O chiplet, creating a unified architecture that supports multiple product segments.

## B. NoC Architecture Is Shifting From Flatten to Hierarchy

As chiplet-based SoCs have evolved, their network on chip topology has shifted from the flat layouts typical in monolithic designs to hierarchical topologies suited for multichiplet integration. One example uses distinct networks inside each chiplet and separate inter-chiplet networks to build a hierarchical architecture [29]. As illustrated in Fig. 5, each GPU chiplet contains a full mesh network internally, with two communication nodes on its upper and lower edges interfacing with the interposer layer. The interposer then employs an interchiplet network that connects GPU chiplets with one another as well as with memory chiplets, forming a hierarchical structure. A novel Network-on-Interposer(NoI) architecture based



Fig. 5. A hierarchical mesh in NOC design. (Redrawn from [T. Wang], *IEEE International Symposium on high-performance computer architecture (HPCA)*, 2022).



Fig. 6. Multi-Ring topology. (Redrawn from [T. Wang], 2022 IEEE International Symposium on high-performance computer architecture (HPCA), 2022).

on multiple space-filling curves delivers up to 58% latency reduction and 64% energy savings compared to conventional mesh-based NoI designs [30].

Fig. 6 shows that hierarchical topologies—such as multiring—are widely adopted due to their ease of scalability and bandwidth aggregation [31]. Simba implements a hierarchical mesh-on-mesh architecture where each chiplet contains an internal mesh NoC and connected by a  $6 \times 6$  mesh on substrate [32]. Besides hierarchical homogenous NoC structure, hybrid architecture also appeared. A reconfigurable mesh network with a reconfigurable bufferless Torus network was introduced [33]. Packets can dynamically choose between routing through the mesh network or fast forwarding via the ring network.

With the introduction of hierarchical network topology in chiplet-based SoC design, challenges such as new type of deadlocks may arise. To prevent deadlocks in inter-chiplet networks, AMD employs steering limits at the input/output ports of the interposer layer [29], which restrict traffic paths entering or leaving a chiplet. These limits eliminate cyclic dependencies in inter-chiplet communication that cause resource contention and deadlocks, while maintaining modularity and local routing strategies within each chiplet. (illustrated in Fig. 7).

Furthermore, Wu et al. proposed a dedicated deadlock recovery framework called UPP [34], which detects a deadlock by discovering the stalled upward packet moving from the interposer to the connected chiplet via the vertical link and recovers the system from deadlock by transmitting the upward packet to its destination. Wang et al. use a mechanism to resolve cross-ring communication deadlocks by reserving transmit buffers in cross-ring bridge nodes (RBRG-L2) [31] [33], as shown in Fig. 8. Chen et al. explored to implement



Fig. 7. Turn restrictions anti-deadlock mechanism in a hierarchical mesh topology. (Redrawn from [T. Wang], *IEEE International Symposium on High-Performance Computer Architecture (HPCA)*, 2022).



Fig. 8. SWAP Anti-deadlock mechanism. (Redrawn from [T. Wang], 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2022).

adaptability and fault-tolerance capabilities under irregular topologies due to failure vertical link by proposing a hybrid deadlock recovery algorithm [35]. Zhang et al. introduced an inter-chiplet priority-driven deadlock resolution to solve inter-chiplet deadlock by detection and recovery in boundary routers [36], which can be used in many packaging technologies, not depend on active interposer usage.

#### C. Specialized Chiplet Design Become to Emerge

To address the requirements of diverse applications, a variety of specialized chiplets have been developed for CPU SoC, including compute chiplets, HBM chiplets, I/O chiplets.

HBM memory could be thought as the first specialized chiplet. HBM1 was introduced in 2013, then HBM2, HBM2E, HBM3, HBM3E and HBM4 published with bandwidth increased from 128GB/s to 2TB/s [37]. Due to expensive cost associated with HBM, the industry is increasingly exploring memory chiplet technologies based on DDR, LPDDR, GDDR and even new memory technology as alternative [38], [39], [40], [41], [42].

For CPU SoCs developed using a chiplet-based approach, physical-level interconnect protocols such as UCIe are insufficient for specialized chiplets integration [43]. A system-level interconnect protocol such as ARM CHI is required to facilitate high-level communication between chiplets, enabling functionalities such as cache coherence in multi-core CPUs. Typically, the compute chiplet assumes a leading role by initiating system-level activities among chiplets.

For example, ARM's CSA (Fig. 9) is an ARM system building specification co-developed with over 60 industry partners



Fig. 9. Compute 1 Chiplet Examples. (Redrawn from Arm CSA, 2025).

[44]. At the heart of CSA lies AMBA CHI C2C [45], ARM's extension of its coherent hub interface (CHI) protocol to work across multiple dies. ARM's Neoverse compute subsystems (CSS) represent the first commercial implementation of CSA-compliant high-performance compute chiplet [46]. To make SoC development easier, Cadence developed a system chiplet using CSA standards include CSS, system IP such as memory controller and UCIe connectivity [47]. Alphawave Semi's collaboration with ARM presents CSA-compliant networking applications SoC, include CSS,UCIe and High-Speed I/O such as PCIe Gen 6.0/7.0, and 112/224G Ethernet connectivity [48].

On the other hand, the RISC-V ecosystem does not currently have a unified chiplet architecture initiative comparable to ARM's CSA [49]. While there are significant RISC-V chiplet activities across multiple fronts—from major government-funded projects to commercial implementations—the approach remains diverged and implementation-specific rather than standardized.

In the chiplet paradigm, the active interposer often assumes the role of an I/O chiplet: a repository for high-density SerDes, memory controllers, and power-management circuits. A modular I/O die concept using active interposer was demonstrated [50]. It supports up to six compute chiplets interconnected via an on-interposer NoC to solve communication intention between chiplets. A 3-D NoC implemented in an active interposer with adaptive routing delivered 15% higher throughput [51].

Designing an active interposer as an I/O chiplet entails resolving multiple co-optimization challenges. First, one must harmonize process technology: compute chiplets benefit from leading-edge FinFET nodes, whereas interposer logic is more cost-effective in mature CMOS. A mixed-node integration at 28 nm and 65 nm was investigated, showing that voltage-level translators and clock-domain crossing circuits embedded in the interposer could ensure signal integrity across process boundaries [52]. Second, thermal and power distribution engineering within the thin interposer layer demands careful floorplanning. A comprehensive design flow for active interposer–based 2.5-D ICs was presented to achieve IR drop by co-designing power distribution and interconnect floorplanning [53].

Prototyping active interposer-based systems has matured significantly with industry-academia collaborations. The DARPA CHIPS program funded the development of a fully functional active interposer platform encompassing heterogeneous accelerators, memory, and I/O interfaces in a single package [54]. Initial silicon trials achieved 2× performance per watt improvements relative to monolithic SoCs in



Fig. 10. Four types of chiplet interconnects within a single SoC system based on the chiplet approach.

TABLE I Difference Between Parallel and Serial Interconnect

| Chiplet<br>Interconnect | Parallel<br>Interconnect | Serial Interconnect    |                               |  |
|-------------------------|--------------------------|------------------------|-------------------------------|--|
| Standards               | UCIe 3.0                 | CEI-112G-<br>MCM       | CEI-112G-<br>XSR              |  |
| Modulation              | NRZ (2-level)            | CNRZ-5 (5b6w encoding) | PAM4 (4-<br>level)            |  |
| Data Rate               | 48 GT/s & 64<br>GT/s     | 112 Gbps               | 112 Gbps (36-<br>58 Gsym/s)   |  |
| Reach                   | 2 ~ 25mm                 | ~25mm<br>(2.5-D/3-D)   | ~50mm<br>(organic<br>package) |  |
| Application             | chiplet-to-<br>chiplet   | Chip-to-Chip           | Chip-to-optics                |  |

data-center AI benchmarks. Similarly, recent work at TSMC's 3DFabric showcases integration of RF front-ends and photonic I/O within the active interposer layer for millimeter-wave communications, highlighting the versatility of this technology [55].

#### V. CHIPLET INTERCONNECT

In SoC systems based on chiplet approach, interconnect technologies fall into four main categories: 1) parallel interconnect using electrical single-ended signals, 2) memory interconnect using electrical single-ended signals, 3) serial interconnect using electrical differential signals, and 4) optical interconnect using co-packaged optics (CPO) technology. While memory interconnect is also based on single-ended signals, they are often treated as a distinct category due to different interface protocol between chiplets. Optical interconnect enables high-speed connections between chiplet- based SoC or between a chiplet based network interface SoC and a switch SoC by integrating an Optical I/O chiplet with a conventional electrical chiplet.

As shown in Table I, parallel interconnects based on singleended signaling and serial interconnects utilizing differential signaling differ fundamentally in architecture and technology [56], serial interconnects are more used between heterogenous chiplets such as electric and optic, or as a homogenously extension to a SoC based on monolithic design (Fig. 10). Notably, this survey will not cover long reach interconnect technology such as PCIe and Ethernet.

| TABLE II             |   |  |  |
|----------------------|---|--|--|
| PARALLEL INTERCONNEC | т |  |  |

| work | Data<br>Rate<br>(Gb/s) | Modulation<br>Scheme/<br>transmission<br>architecture | Technology<br>Node | Power<br>Efficiency<br>(pJ/bit) | Channel<br>Type/Length    | Equalization<br>Method                            | Crosstalk<br>Cancellation                                    | year |
|------|------------------------|-------------------------------------------------------|--------------------|---------------------------------|---------------------------|---------------------------------------------------|--------------------------------------------------------------|------|
| [69] | 20.83                  | CNRZ-5<br>(5b6w)                                      | 16nm               | 1.02                            | Up to<br>30mm             | Continuous-<br>time linear<br>equalizer<br>(CTLE) | Inherent in<br>CNRZ-5                                        | 2020 |
| [70] | 40                     | NRZ                                                   | 7nm                | 1.7                             | Die-to-die                | Transmitter calibration                           | 6/7b spatial<br>encoding                                     | 2021 |
| [68] | 20                     | NRZ                                                   | 7nm                | 0.46                            | 1mm InFO,<br>3mm<br>CoWoS | Minimum<br>intrinsic auto-<br>alignment           | Noise-<br>immunity<br>encoding                               | 2021 |
| [65] | 50.4                   | SBD                                                   | 5nm                | 0.297                           | 1.2mm on-<br>chip         | SBD hybrid<br>circuit                             | Built-in<br>SBD hybrid                                       | 2022 |
| [71] | 32                     | NRZ                                                   | 4nm                | 0.44                            | 3mm silicon<br>interposer | Direct DFE +<br>RCD                               | Not<br>specified                                             | 2023 |
| [72] | 40                     | GRS                                                   |                    | 1.3                             | 60mm pkg<br>+ 60mm<br>PCB | Edge-boost<br>EQ (8dB) +<br>Rx linear EQ<br>(6dB) | <-35dB<br>crosstalk<br>specification                         | 2023 |
| [75] | 25.2                   | AC-coupled<br>toggle<br>(ISR-ACT)                     | 5nm                | 0.190                           | 1.2mm on-<br>chip         | AC-coupling<br>with positive<br>feedback          | Not<br>specified                                             | 2024 |
| [74] | 32                     | NRZ                                                   | 3nm                | 0.36<br>(0.23 @<br>25Gb/s)      | 1-2mm<br>CoWoS            | Real-time per-<br>lane CDR                        | Not<br>specified                                             | 2024 |
| [73] | 20                     | current-<br>mode                                      | 40nm               | 0.246                           | 1mm<br>shield-less        | Transimpedance<br>amplifier<br>(TIA)              | Systematic<br>XTC via<br>input<br>resistance<br>optimization | 2024 |
| [66] | 64                     | SBD                                                   | 28nm               | 1.21                            | 3mm on-<br>chip           | Digital FIR<br>filter with<br>EC/XTC              | Echo +<br>NEXT/FEXT<br>cancellation                          | 2025 |
| [67] | 4-16                   | NRZ                                                   | 3nm                | 0.29                            | 1.4mm<br>CoWoS-S          | Clock-<br>forwarded<br>delay-matched              | Not<br>specified                                             | 2025 |

Fig. 10 illustrates a typical chiplet-based SoC employing four different interconnect technologies. In chiplet-based SoC architectures partitioned via system buses or NoC, parallel interconnects such as UCIe take precedence [57]. For SoC designs which have been completed already that need expansion through chiplet integration, serial interconnects are preferred solution, with the OIF-CEI XSR/MCM standard explicitly developed for this purpose [58]. Furthermore, HBM is widely used as a memory interconnect in chiplet-based SoCs, particularly for high-performance applications such as AI and HPC [59]. OpenHBI seeks compatibility with HBM PHYs and may enable shared physical interconnect between chiplet-to-chiplet and chiplet-to-HBM connections [60]. Moreover, Industry progress on CPO is currently driven by standardization efforts, such as the OIF's CPO [58] and CCITA CPO [61], along with prototype demonstrations like TeraPHY [62], and Nvidia switch based on CPO [63].

#### A. Data Rate in Parallel Interconnect Is Increasing Quickly

As shown in Table II, a remarkable evolution range from simple NRZ signaling at 20 Gb/s to sophisticated simultaneous bi-directional modulation schemes achieving up to 64 Gb/s per wire, with power efficiencies improving from over 1 pJ/bit to below 0.2 pJ/bit [64], [65], [66], [67], [68], [69], [70], [71], [72], [73], [74], [75], [76]. Especially, McCollough et al. achieved 40 Gb/s/pin using NRZ with a 6/7b encoding scheme to emulate differential signaling characteristics and effectively mitigates ground reference noise by approximately 8 dB through spatial balancing [70]. Two significant implementations explored Simultaneous Bi-Directional (SBD) transmission architecture to double bandwidth efficiency. Nishi et al. proposed a novel hybrid circuit that subtracts outbound signals from received sig-

nals to enable bidirectional communication while maintaining signal integrity, achieved 50.4 Gb/s/wire (25.2 Gb/s each direction) using an inverter-based approach with 0.297 pJ/bit efficiency [65]. Wang et al. introduced a dynamic voltage threshold (D-VTH) circuit combined with echo and crosstalk cancellation techniques, specifically targeting both near-end crosstalk (NEXT) and far-end crosstalk (FEXT) through digital finite-impulse-response (FIR) filters and standing wave clock distribution for improved synchronization, pushed SBD performance to 64 Gb/s/wire with 10.5 Tb/s/mm/layer density at less than  $10^{-16}$  BER [66].

Single-ended signaling presents unique challenges including susceptibility to common-mode noise, crosstalk, and simultaneous switching noise, requiring innovative circuit techniques and architectural solutions [77]. Seong et al. implemented reflection-cancellation drivers (RCD) and data bus inversion encoders/decoders to minimize simultaneous switching noise while employing direct decision-feedback equalizers for ISI compensation [71]. Lin et al. addressed supply voltage droop effects through matched-delay architectures and P/N ratio calibration techniques that counter systematic process skews [76]. Lee et al. achieved systematic crosstalk cancellation by balancing capacitive and inductive coupling through optimized receiver input resistance, eliminating the need for additional XTC circuitry [73]. Wei et al. utilized ground-referenced signaling (GRS) with wideband phase rotating receiver PLLs and offset compensation circuits to handle temperature-induced variations [72].

#### B. HBM Keep Improving on Date Rate and Signal Quality

High Bandwidth Memory (HBM) has emerged as a critical technology for meeting the exponential growth in memory bandwidth demands of modern high-performance computing, artificial intelligence, and data-intensive applications. it continues to improve bandwidth, signal quality, and power efficiency.

Seong et al. demonstrated a 48 Gb/s/wire NRZ based transceiver, representing a 4-5x improvement over conventional HBM3 speeds [78]. Park et al. demonstrated a 375-GB/s/mm PAM-4 interface achieving 68.7-fJ/b/mm energy efficiency through per-pin training sequences and charge-recycling samplers [79]. Di-code signaling offers another approach for power-efficient HBM interfaces. Park et al. presented a 0.385-pJ/bit di-code transceiver with TIA termination that eliminates static power consumption during long consecutive identical digits (CID) patterns [80]. To address the enormous memory bandwidth and capacity needs of large language models (LLMs) and generative AI, Lee et al. introduced a work to improve HBM3E that achieves 1280 GB/s bandwidth with a cube density of 48 GB through all-around power TSV and a 6-phase RDQS scheme [81].

Single-ended signaling in HBM systems must contend with unprecedented signal density, with current implementations supporting 1024-bit parallel interfaces and future HBM4 systems targeting 2048-bit interfaces. This massive parallelism creates unique signal integrity challenges that distinguish HBM from conventional memory systems. Tang et al. addresses a critical computational bottleneck in HBM signal integrity verification by developing efficient simulation

methodologies for full-channel parallel data transfer analysis in HBM-AI chip interconnects [82].

High-speed HBM interfaces require sophisticated equalization schemes to combat inter-symbol interference (ISI) and maintain signal integrity across varying channel conditions [83]. Recent implementations have employed both transmitterside and receiver-side equalization techniques. Kim et al. developed energy-efficient TSV I/O designs employing predrivers with main drivers to enhance slew rates and mitigate process, voltage, and temperature (PVT) variations [84]. Advanced equalization schemes have also been implemented for next-generation memory interfaces. Seong et al. used on-chip feedback equalization in their 48 Gb/s/wire transceiver work, achieving 1.66 Tb/s/mm beachfront bandwidth [78].

Power consumption represents a critical challenge in HBM design, particularly as stack heights and data rates increase. The combination of high I/O count and aggressive performance targets can lead to significant thermal issues that limit system performance. Chae et al. achieved 0.29 pJ/bit energy efficiency through techniques including resistor-tuned offset calibration, supply noise adaptation algorithms, and optimized physical structures with stacked I/O configurations [85]. To enhance HBM's energy efficiency in specific use cases, Kim et al. developed a lightweight ECC method that enables significant power reduction through extended refresh cycles [86].

#### C. Serial Interconnect Was Standardized as XSR Serdes

Unlike parallel interconnects, serial interconnects are typically surpassing 32 Gb/s, because it was traditionally designed for long-reach applications. It has progressively evolved to accommodate various high-speed interconnection needs, from long reach (LR, < 1m) to short reach (SR, < 200mm), extrashort reach (XSR, < 50mm), and even ultra-short reach (USR, < 10mm) [87]. The Optical Internetworking Forum (OIF) CEI-112G-XSR and CEI-112G-MCM specification represents the most significant standardization effort for high-speed serial chiplet interconnects, defining a 112 Gbps electrical interface specifically optimized for die-to-die (D2D) and die-to-optical engine (D2OE) applications [88].

For serial interconnects, bandwidth density can be further increased by adding more channels and employing higher-order modulation schemes like PAM-4 to boost data rates [89]. The widespread adoption of energy-efficient equalization techniques and low-power architectures has improved signal integrity while minimizing energy consumption. Several implementation around 112G/s demonstrate feasibility by using 7nm [90], [91], [92], 5nm [93] and 3nm [94], achieved energy efficiency from 1.7 pJ/bit to 0.9 pJ/bit.

Recent research published demonstrates ultra-high-speed wireline transmitter design methodologies specifically targeting 160 Gbps operation for ultra-short reach die-to-die applications with channel losses limited to approximately 3dB. The research introduces automated generation frameworks using the Berkeley analog generator (BAG) 3++ for process-portable circuit development, enabling rapid development from circuit design to tape-out in advanced FinFET nodes. Academic contributions indicate that such ultra-high-speed implementations require full-custom analog design approaches

rather than conventional RTL synthesis methodologies due to the extreme power and area constraints imposed by > 100G Baud operation [95].

Academic research has explored advanced equalization architectures specifically designed to overcome the feedback-induced latency bottlenecks that limit conventional decision feedback equalizer (DFE) performance at data rates exceeding 100G Baud. Research published demonstrates feedforward equalizers inspired by maximum likelihood sequence estimation (MLSE) algorithms that achieve comparable error statistics to conventional DFE implementations without feedback latency penalties. The research presents a 160 Gbps NRZ receiver implementing 1-tap fully feedforward MLSE equalization fabricated in 16nm FinFET process technology [96].

## D. Co-Packaged Optic Is Still at Early Stage of Commercialization

In recent years, CPO has emerged as a promising technology for data exchange problem inside data center. Advances in silicon photonics now enable monolithic integration of various components within optical transceivers [97], [98], but still challenge to achieve good performance for both electric and optic component at same integration. Optimization of wavelength division multiplexing (WDM) transceivers based on the fully CMOS-SOI monolithic integration technology has further boosted the throughput of optical interconnects, with single-channel speeds reaching 64 Gb/s [99]. Moreover, WDM transceiver modules equipped with dynamic thermal tuning systems can reliably operate across eight 32 Gb/s channels, maintaining low bit error rates over a wide temperature range [100]. An optical receivers feature polarization-insensitive designs that deliver high bandwidth density and low power consumption at data rates up to 200 Gb/s by using key components of a polarization splitting grating coupler, dual ring filters and bidirectional photodiodes [101]. At GTC 2025, NVIDIA unveiled its CPO technology integrated networking switches, marking a significant milestone in the transition from demonstration to commercial production [63]. It seems that NVIDIA has overcome a major technological hurdle: the instability of using micro ring modulators in CPO technology.

#### E. Chiplet Interconnect Standards Evolved Rapidly

Chiplet interconnect standards have evolved rapidly across successive generations, significantly enhancing data rates, energy efficiency, and application versatility. The Advanced Interface BUS (AIB) progressed from a basic parallel bus to a low-voltage, adaptive link that doubled lane speeds to 24 Gb/s while halving energy consumption, enabling efficient FPGA chiplets and edge AI applications [102]. UCIe boost bandwidth from a few GT/s to 4 Tb/s/mm² and reducing energy use to under 0.05 pJ/bit, supporting diverse chiplet integrations. OpenHBI employs a point-to-point, four-layer design with 16 Tx/16 Rx lanes and reliable coding at 0.4V [103]. BoW improved from low-speed parallel PHYs to speeds of 32 Gb/s per wire, incorporating power gating and equalization to achieve under 0.25 pJ/bit and low latency, suitable for applications ranging from embedded systems to HPC [104]. The

TABLE III
PARALLEL INTERCONNECT STANDARD

| Interconnect<br>type  | Parallel Interconnect                  |                |                 |                                     |
|-----------------------|----------------------------------------|----------------|-----------------|-------------------------------------|
| Standards             | UCIe 3.0                               | AIB 3.0        | BOW 2.0         | CCITA<br>Parallel                   |
| Electrical<br>PHY     | Yes                                    | Yes            | Yes             | Yes                                 |
| Max Data<br>Rate      | 64G                                    | 24G            | 32G             | 16G                                 |
| Channel<br>Pin/Lane   | 16Tx + 16Rx<br>or 64Tx +<br>64Rx       | 20Tx +<br>20Rx | 16TX +<br>16RX  | 16Tx +<br>16Rx or<br>64Tx +<br>64Rx |
| IO Swing              | 0.4V or 0.7V                           | 0.2V<br>or0.4V | 0.75V           | -                                   |
| RX<br>Termination     | Optional                               | No             | Optional        | Optional                            |
| IO Direction          | Bi                                     | Bi             | Bi              | Uni                                 |
| Pad Cap               | 0.125-0.3pF                            | 0.1pF          | 0.125-<br>0.8pF | 0.25pF                              |
| Noise<br>Reduction    | Scrambling/<br>DFE + FFE/<br>CTLE/ FFE | DFE +<br>FFE   | CTLE/FFE        | Scrambling                          |
| Redundant<br>Pin/Lane | 64+4Pins                               | Optional       | Optional        | No                                  |
| Logical PHY           | Yes                                    | Yes            | No              | Yes                                 |
| Training              | Yes                                    | No             | Yes             | Yes                                 |
| Initialization        | Yes                                    | Yes            | Yes             | Yes                                 |
| Sideband              | Yes                                    | Yes            | Yes             | Yes                                 |
| Link<br>Control       | Yes                                    | Yes            | No              | Yes                                 |
| Protocol              | Yes                                    | No             | Raw/Native      | Yes                                 |
| Packaging             | Standard                               | No             | Standard        | Standard                            |
| r ackaging            | Advance                                | Advance        | Advance         | Advance                             |

latest HBM generation scaled bandwidth from 128 GB/s in 3-D stacks to over 2 TB/s with 64 GB capacity, now serving as a foundation for graphics, AI, and server workloads [105]. CEI-XSR optimized short-reach links by increasing speeds from 25 Gb/s to beyond 100 Gb/s, reducing power consumption and latency for data center environments [106], [107], [108]. Notably, CCITA chiplet standard supports high-performance parallel, serial interconnect in one standard and defined a separate CPO standard, allowing to meet diverse application requirements by offering multiple connection methods [109]. To solve the compatibly between different chiplet interconnect standard, IEEE initiate the P3468 working group to develop a physical layer specification cover different interconnect implementation [110].

Table III, Table IV, Table V summarizes part of latest chiplet interconnect standardization and their key parameters. The UCIe 2.0 specification has become the leading open standard for chiplet interconnect, achieving an impressive bandwidth density of 10 Tbps/mm by leveraging advanced packaging technologies such as hybrid bonding. In contrast, the BOW and AIB protocols offer single-lane speeds between 6.4 Gb/s and 16 Gb/s, with 20 to 32 lanes per interface—a balanced "medium speed × medium lane count" approach. HBM3 defines standardized 3-D-stacked memory interfaces that deliver 6.4 TB/s of bandwidth, while OpenHBI try to define its chiplet interconnect compatible with HBM. The CEI-XSR is used in applications such as AI chip extension [89] or switch chip extension [111], or die-to-OE application [112]. The CCITA protocol stands out for its support to both serial and parallel interface.

TABLE IV
MEMORY INTERCONNECT STANDARD

| Interconnect type  | Memory Interconnect |                         |  |
|--------------------|---------------------|-------------------------|--|
| Standards          | HBM4                | OpenHBI 1.0             |  |
| Electrical PHY     | Yes                 | Yes                     |  |
| Max Data Rate      | 8.0G                | 16G/32G                 |  |
| Channel Pin/Lane   | 2048                | 42 (Bi-Dire) /<br>DWord |  |
| IO Swing           | 0.4V                | 0.4V                    |  |
| RX Termination     | Yes                 | No                      |  |
| IO Direction       | Bi                  | Bi                      |  |
| Pad Cap            | -                   | 0.35pF                  |  |
| Noise Reduction    | Yes                 | Yes                     |  |
| Redundant Pin/Lane | Yes                 | 2/DWord                 |  |
| Logical PHY        | Yes                 | Yes                     |  |
| Training           | Yes                 | Yes                     |  |
| Initialization     | Yes                 | Yes                     |  |
| Sideband           | No                  | Yes                     |  |
| Link Control       | Yes                 | No                      |  |
| Protocol           | No                  | No                      |  |
| Packaging          | No                  | No                      |  |
| 1 ackaging         | Advance             | Advance                 |  |

 $\label{eq:table V} \textbf{SERIAL INTERCONNECT AND CPO STANDARD}$ 

| Interconnect<br>type  | Serial Inter | СРО             |                      |
|-----------------------|--------------|-----------------|----------------------|
| Standards             | XSR 112G     | CCITA<br>Serial | CCITA CPO            |
| Electrical<br>PHY     | Yes          | Yes             | Yes                  |
| Max Data<br>Rate      | 112G         | 32G             | 106.25 Gb/s ± 200ppm |
| Channel<br>Pin/Lane   | 16Tx + 16Rx  | 16Tx +<br>16Rx  | 4                    |
| IO Swing              | 0.75VPP      | -               | 0.8V                 |
| RX<br>Termination     | Required     | Required        | No                   |
| IO Direction          | Uni          | Uni             | -                    |
| Pad Cap               | 0.13pF       | 0.13pF          | -                    |
| Noise<br>Reduction    | Scrambling   | Scrambling      | -                    |
| Redundant<br>Pin/Lane | No           | No              | -                    |
| Logical PHY           | Yes          | Yes             | No                   |
| Training              | Yes          | Yes             | No                   |
| Initialization        | Yes Yes      |                 | No                   |
| Sideband              | No           | No              | No                   |
| Link Control          | No           | Yes             | No                   |
| Protocol              | No           | Yes             | No                   |
| Packaging             | Standard     | Standard        | Standard             |
| 0 0                   | Advance      | Advance         | Advance              |

## VI. ADVANCED PACKAGING FOR CHIPLET-BASED SYSTEMS

## A. A Brief History of Advanced Packaging

Given limited space, this survey will not go into very detail for all concept of advanced packaging, but focus on some important trends, such as silicon bridge, hybrid bonding and SoW packaging, after on a brief introduction of advanced packaging for chiplet. Packaging architecture for chiplet integration includes various architectures such as 2-D, 2.1-D, 2.3-D, 2.5-D, 3-D, 3.5-D designs. Fig. 11 illustrates the concept of advanced packaging, which are well explained in [9].



Fig. 11. Advanced packaging concept. Groups of advanced packaging: 2-D, 2.1-D, 2.3-D, 2.5-D, and 3-D IC integration. (Redrawn from [John H. Lau], *IEEE Transactions on Components*, 2022).

To enhance interconnect density, 2.5-D packaging using silicon interposers and TSVs was introduced [117]. Alternatively, 2.5-D packaging with redistribution layer (RDL) interposers offers a similar package structure to silicon interposers [118], [119], [120].

To address the high costs of silicon interposers and the lower density of RDL interposers, Intel developed the embedded multi-die interconnect bridge (EMIB) [121]. Furthermore, innovative approaches involving placing silicon bridges within the RDL layer beneath the dies have been reported and are now being adopted in the industry [122], [123], [124], [125].

3-D packaging is a way to integrate IC by vertically stacking different chips or wafers together into a single package. The first active interposer concept proposed by Joungho Kim can significantly improve signal integrity, power integrity and reduce power consumption of 3-D IC [126]. Coudrain et al. successfully demonstrated six interconnected 16-core MIPS chiplets stacked on an active interposer encompassing NoC for chiplet-to-chiplet communication [127].

3.5-D advanced packaging is an emerging hybrid integration approach [128]. It combines both 2.5-D interposer-based lateral integration and elements of 3-D vertical stacking within the same package, allowing designers to maximize performance, flexibility, and silicon utilization. In 3.5-D packaging, Hybrid Bonding (HB) enabling further scaling of interconnect density without the need for additional bumps [129], [130].

To solve the reticle expansion limitation introduced by using interposer, wafer level integration was proposed. Lie et al described the Cerebras architecture and how it is designed, using fine-grained data flow compute cores to accelerate unstructured sparsity [131]. Another way to build a SoW is to use InFO\_SoW technology, which achieved 2 times higher bandwidth density and improved power efficiency by 97% compared with conventional system [132].

## B. Silicon Bridge is a Good Replacement for Silicon Interposer

To reduce the cost of 2.5-D packaging, silicon bridge technology has emerged as a promising solution. Intel's EMIB is a typical example of such technology. It uses thin pieces of silicon with multi-layer BEOL interconnects, embedded in organic substrates. It has dense fine pitch interconnects and localized high density wiring ensuring rest of the on-package interconnect not affected by the presence of the bridge [121].

During the development of silicon bridge techniques, various technical routes have been explored. Concurrently, TSMC developed its own localized bridge solutions through the CoWoS-L and InFO-L platforms [133]. The interposer of CoWoS-L includes multiple local Si interconnect (LSI) and global RDL to form a reconstituted interposer (RI) to replace a monolithic silicon interposer in CoWoS-S. Through insulator via (TIV) is introduced in the RI as vertical interconnect to provide a low insertion loss path than TSV. The structure of integrated LSI-1 and LSI-2 provides the design flexibility of superior SoC-to-SoC and SoC-to-HBM interconnect in one package. These technologies employed Local Silicon Interconnect bridges that were positioned above the substrate rather than embedded within it. ASE introduced FOCoS-Bridge technology [134], which embedded silicon bridges within fanout redistribution layers, achieving submicron line width and space. SPIL developed Fan-Out embedded bridge (FO-EB) providing the prominent Cu Via through RDL interposer with embedded localized silicon [135]. FO-EB can replace the heterogeneous packaging for HBM-GPU owing to the good signal Integrity and better manufacturing cost [136]. S-connect package can have multiple RDL layers and silicon bridges to achieve near single-chip short-range BEOL connections between chips, to provide the same die-to-die routing with smaller than 1- $\mu$ m line/space (L/S) and vertical connections. S-connect can be manufactured on silicon, glass, and even epoxy resin molding compounds (EMC) [137]. OSAT solutions differentiate through their fan-out integration approach, embedding bridges within molding compounds rather than positioning them above substrates.

The IME EFI (Embedded Fine Interconnect) technology [138], [139] emerged as another notable development. It employs a face-to-face solution where ASIC and HBM2 chips are mounted directly on top of embedded fine interconnect chips within an RDL-first fan-out wafer-level packaging platform achieving high density interconnect between ASIC and HBM2 memory module and having good reliability. It seems that this technology is still in development.

## C. Hybrid Bonding Is an Excellent Solution for 3-D IC

While TSVs and micro bumps currently dominate the market, hybrid bonding represents the most promising technology for future 3-D packaging implementations [140]. Hybrid bonding (HB) enables interconnect pitches of  $< 1\mu m$  through the combination of non-convex, copper-to-copper direct bonding and-to-medium bonding, providing ultra-high I/O densities (>  $100 \text{K/mm}^2$ ), ultra-short interconnect lengths, extremely low consumption power, and a thinner stack profile.

A key development is the ongoing reduction in bond pad sizes, highlighted by the achievement of hybrid bonding with a 0.5  $\mu$ m pitch and high alignment accuracy. Netzband et al. reduced the solder bump to within 3 nm by chemical-mechanical means and the compensation technology of layout-effect, pushed Cu-Cu hybrid bonding with 0.5  $\mu$ m pitch and < 50 nm alignment accuracy to a level that is producible, and



Fig. 12. (1)Reticle stitching, (2)WSE system components.

established feasible process platform for the next generation of  $0.25\mu m$  pitch hybrid bonding [141]. Another important trend is proposing a symmetrical Cu/polymer scheme without CMP which can achieve wafer level interconnection without voids and high alignment accuracy at  $180^{\circ}$ C. This scheme makes Cu bumps and polymers (such as BCB, polyimide) synchronously on the bonding interface of the upper and lower wafers, a mirror-symmetric structure to avoid the problem of uneven thickness caused by the single-sided CMP in traditional processes [142]. Fine-pitch, high-density interconnections were enabled by low-Temperature Cu-Cu Bonding; Cu/Dielectric Hybrid Bonding; Cu/Polymer Symmetric Hybrid Bonding without Cu [143].

In addition to innovations in materials and processes, optimizing the bonding process is gaining increasing momentum, as it is crucial for achieving void-free bonds and minimizing misalignment. A DVS-BCB bonding process is proposed that achieves a voidless, high-density pattern array at 250°C, 0.6 N/mm² compression pressure, and in a vacuum. Moisture is effectively excluded by controlling the pressure to eliminate voids, while enhancing the bond density of Si-O-Si with Si-CH, with a bonding strength of 11.5 N/cm² [144]. A "symmetric deformation" strategy is proposed to offset the nanoscale alignment error caused by gravity and single-sided pressure in conventional Cu-Cu bonding simultaneously controlling the gap between the top and bottom wafers [145].

Moreover, the development of 3.5-D packaging architectures marks a new direction in hybrid bonding applications [129]. This approach not only boosts total board power (TBP) to 750w, memory clock to 2.6GHz, Peak memory bandwidth to 5324.8 GB/sec, but also provides greater flexibility in product design and scalability.

## D. Packaging for SoW Start to Emerge

SoW is an emerging advanced packaging paradigm that brings together multiple chiplets, memory, and I/O components on a single wafer-scale carrier, eliminating the need for traditional die-level packaging [146]. This approach maximizes performance, power efficiency, and integration density, making it ideal for applications such as AI supercomputers, and next-generation HPC, even general-purpose computing based on CPU. The carrier can be an actual wafer, in which case all processes are completed at the foundry—an example of the Cerebras WSE system is shown in Fig. 12.

Cerebras's WSE computer employs reticle stitching packaging technology, a lithographic technique that allows the creation of integrated circuits larger than a single reticle



Fig. 13. (1)Tesla dojo computer, (2)Info\_SoW advanced packaging.

field—typically  $26 \times 33$  mm<sup>2</sup> for current EUV scanners. This process stitches together multiple reticle exposures into a design, overcoming the physical size limitations of photolithography tools. Multiple reticle fields are precisely aligned and sequentially exposed on the wafer with better than 5 nm overlay (OVL) accuracy, enabling continuous interconnects across adjacent fields. The principle of reticle stitching technology is to divide a large layout into multiple small blocks, which are exposed by the photolithography machine in sequence, and precise overlapping and connection are made at the boundary of adjacent exposure areas, just like a jigsaw puzzle [149]. An offset reticle of wiring between the 525-mm<sup>2</sup> reticles was used to stitch together the standard reticles. They used a homogenous array of processing elements (PE) rather than redundant gates. They employed more than 300 individual voltage regulation modules (VRMs) distributed over the wafer surface that drive current into the wafer perpendicular to its surface. Using multiple VRMs per reticle ensures redundancy in the power distribution and gives individual control of each reticle's power domain [150].

Another approach to building a SoW system is TSMC's InFO SoW advanced packaging technology [132]. Unlike reticle stitching, this method integrates arrays of known-good dies with power and thermal modules, enabling compact systems that retain the advantages of wafer-scale integration. Electrical characterization shows excellent process uniformity across the entire wafer, which is crucial for maintaining consistent performance in large-scale systems. An integrated and enforced liquid cooling thermal management system was proposed. The heat generated by the chip is transferred to TIM 2, then transversely diffuses heat to the heat spreader, and then longitudinally conducts heat to TIM 1. TIM 1 takes away heat directly through the cooling fluid with the microchannel cold plate of complex channel design. Abandoning the traditional approach of powering from the edge of the package, instead opting for "backside vertical powering". A separate, wafersized power board containing a sophisticated power network is flip-chip bonded, face-to-face with the InFO SoW wafer, using tens of thousands microbumps. Rather than entering from the edge, current flows in from the power board, through these vertical interconnect points, and directly "injects" into various functional areas of the wafer that require power [151]. A typical product using this method is Tesla's Dojo system (Fig. 13) [148].

UCLA's center for heterogeneous integration and performance scaling (CHIPS) has developed a novel wafer-level packaging technologies different from reticle stitching and

TSMC's InFO\_SoW technology (Fig. 13). Silicon Interconnect Fabric (Si-IF) [152] represents UCLA's wafer-level packaging innovation. At the heart of Si-IF is its ability to establish extremely fine-pitch interconnections between silicon dies—routinely achieving pitches as small as 2-10 micrometers, levels previously restricted to on-chip wiring. This enables inter-die spacing of 100 micrometers or less, allowing for communication links between dies no longer than 500 micrometers.

## E. Other Aspects of Advanced Packaging

Advanced packaging faces significant thermal challenges due to high power densities and compact form factors. In 3-D-stacked integrated circuits (ICs), hotspots are common, and heat dissipation is further complicated by material limitations such as poor thermal conductivity and coefficient of thermal expansion (CTE) mismatches [153]. To address these issues, solutions include advanced thermal interface materials (TIMs) incorporating graphene or carbon nanotubes, substrates coated with silicon carbide (SiC) or diamond, and liquid cooling systems utilizing microfluidic channels [154]. Optimizing package design with thermal TSVs and effective power mapping in 2.5-D ICs is also essential. It performed thermal simulation of the 2.5-D TSV package illustrates the temperate distribution of the 2.5-D TSV package. It can be seen that the highest and lowest temperature are about 62.7°C and 36.7°C. For the thermal performance of the 2.5-D TSV package, choosing a heat spreader size of  $60 \times 60 \times 2 \text{ mm}^3$ is reasonable [155]. Additionally, multi-physics simulations and AI-driven modeling play a key role in identifying and mitigating thermal problems.

Large Fan-Out package with Multi-Chiplet Integration encounters significant challenges with warpage and reliability as package and die sizes grow and material complexity increases. The trace has larger stress at high temperature (HT) that comes from a tiny irregular deformation of package, so the trace broken risk is higher at HT than room temperature (RT) [156]. Common problems include warpage, cracking, delamination, and solder joint failures. Careful package design—considering factors such as substrate material, Cu ratio, die-to-substrate ratio, and the type and thickness of heat spreaders—is essential to address and minimize these issues.

To improve heat dissipation, TIMs with higher thermal conductivity are required—Polymer-Based Thermal Interface Materials (P-TIMs) reaching up to 10 W/mK and Carbon based TIMs (C-TIMs) up to 20 W/mK [157], [158]. In AI processor applications, Intel's recent studies have shown that advanced P-TIMs can achieve a 20% reduction in thermal resistance compared to conventional materials [112]. For 3-D stacked packages, industry research has revealed that P-TIMs with thermal conductivity ranging from 8-15 W/mK are increasingly utilized in TSMC's CoWoS and Intel's Foveros technologies [109]. C-TIMs enable superior thermal management capabilities essential for next-generation high-power electronics and automotive systems. Liu et al. achieved a breakthrough by developing vertically aligned boron nitride and graphite films that reached record-high



Fig. 14. Chiplet EDA design flow.

through-plane thermal conductivity of 23.7 W/mK through innovative stacking-cutting methods [159].

Glass interposer, known for their excellent stability and low CTE, are gaining traction in high-frequency applications despite the risk of cracking [160]. A "5.5D" chiplet packaging architecture using glass interposers was introduced that enables both embedded and stacked chiplet configurations, which is impossible with silicon interposers [161]. Another research demonstrates that glass substrates result in higher interconnect density compared to organic substrates and provides ultra-low flatness that improves lithography depth of focus [162].

#### VII. EDA TOOLS FOR CHIPLET-BASED DESIGNS

#### A. Integrated Design Flow Is Essential for Chiplet

EDA tools are essential throughout the chiplet design process, with every stage—from architectural exploration to physical implementation—demanding innovative tools and approaches to meet diverse design requirements.

The chiplet EDA toolchain (Fig. 14) is progressively evolving into a comprehensive support system that spans the entire design process. During the front-end design phase, the focus is on architectural design, encompassing functional simulation, power analysis, cost evaluation, and design space exploration (DSE). At this stage, EDA tools enable designers to explore various chiplet configurations while assessing their performance, power consumption, and cost. Existing tools primarily utilize NoC simulation frameworks, including inter-simulator communication protocols [163] and involving floor planning cost in performance evaluation [164]. Power analysis at this

level is particularly challenging due to the need to accurately model interactions among heterogeneous chiplets built on different technology nodes. To overcome these challenges, researchers have introduced machine learning methods—such as Kim's reinforcement learning-based optimization for chiplet power delivery networks [165].

As design complexity increases, traditional manual DSE approaches struggle to keep pace. Modern DSE methods employ diverse optimization strategies to automate the exploration of design options and optimize performance, power, and area. These approaches range from machine learning-based optimization to advanced analytical frameworks. For example, Gemini integrates analytical models with machine learning (specifically Simulated Annealing) to perform mapping and architecture co-exploration [166], while NN-Baton employs a purely analytical framework with hierarchical modeling and exhaustive search methods for workload orchestration [167]. Both frameworks reduce inter-chiplet communication overhead and improve energy efficiency by combining workload mapping with architectural adjustments, though through different methodological approaches.

In the mid-end design phase, efforts center on chiplet partitioning and integration to optimize resource allocation and interconnect topologies, thereby lowering system costs and enhancing performance. Notable solutions include the KaHyPar hypergraph partitioning tool, which considers the fundamental and intensively studied problem of balanced hypergraph partitioning [168], and the Chipletizer, a framework to guide the design partitioning which supports the repartitioning of multiple SoCs into reusable chiplets economically and efficiently with user-specified parameters [169].

The back-end implementation phase focuses on physical design and verification, covering reliability analysis, floorplanning, signal and power integrity signoff, and multiphysics modeling. Recent progress includes reinforcement learning-based floorplanning techniques like RLPlanner [170], and thermo-mechanical-electrical co-simulation tools such as Hotspots which focus on Specialized thermal simulation for processors [171] and COMSOL which is a general purpose multi-physics simulation tool [172]. Reliability analysis addresses issues like thermal stress, mechanical shock, and signal/power integrity by leveraging multi-physics simulations and AI-driven optimizations [173]. For instance, thermal sensitivity algorithms combined with AI-powered thermal gradient prediction can be used to optimize heat distribution [174], while deep reinforcement learning methods was used to improve power integrity [175]. The critical timing challenge in 3-D IC design was tackled where traditional verification approaches occur too late in the design cycle, demonstrating that effective physical verification can be performed early in the 3-D IC design cycle, enabling proactive problem solving rather than reactive correction [176], [177].

Although chiplet EDA research has made significant strides in various areas, two major challenges remain. First, despite progress in developing stage-specific tools, fully integrating the entire design flow continues to be a major obstacle. While initial efforts—such as a full-flow EDA framework from PDK definition, layout to timing and PPA analysis for certain chip

designs—have been explored [178], no universally applicable solution has yet been established. Second, collaboration across teams and vendors is hindered by inconsistencies in workflows and tools. Although some researchers have proposed standardized solutions for interfaces, such as packaging standards [179], methodological differences still impede large-scale cooperation.

Future research should concentrate on developing systematic, standardized, and highly interoperable solutions. EDA tool chain for chiplet must seamlessly integrate front-end and back-end workflows to enable cohesive design, simulation, and packaging integration. A key strategy to achieve this is the adoption of STCO [180], which breaks down barriers between design stages and tools by providing a holistic integration of design and simulation.

Moreover, overcoming collaborative EDA challenges requires standardizing data formats, interfaces, protocols, and methods for cross-team and cross-vendor chiplet design.

## B. Standardized Chiplet Models and Data Formats Are Significant for the Synergy of EDA Ecology

Modern chiplet EDA modeling architectures have systematically addressed design complexity challenges by promoting open model exchange mechanisms through the establishment of chiplet design kits (CDKs), directly analogous to the successful Process Design Kits (PDKs) paradigm that revolutionized IC design in the 1990s. The Open Compute Project Foundation, in collaboration with JEDEC, has developed a comprehensive standardized framework comprising four key design kits: assembly design kits (ADKs) for geometries and interconnects, material design kits (MDKs) for thermal and mechanical properties, test design kits (TDKs) for validation processes, and CDKs for integration models [181]. This framework is implemented through the Chiplet Data Extensible Markup Language (CDXML) specification, which provides a standardized XML-based format for machinereadable chiplet models encompassing thermal, physical, mechanical, I/O, behavioral, power, and signal integrity characteristics [182]. The initiative addresses critical industry challenges by enabling multi-vendor EDA tool interoperability, supporting systematic design rule verification, and establishing an open chiplet marketplace where standardized models facilitate component discovery, evaluation, and selection. This approach mirrors the historical evolution from manual design rule documents to automated PDK-based workflows, with major industry players including Siemens EDA, NIST, and semiconductor foundries actively contributing to standards development and commercial implementation.

The OCP-ODSA-CDX (shown in Fig. 15) chiplet standardization framework [181] demonstrates a strategic approach of evolutionary enhancement rather than revolutionary replacement of established industry standards, leveraging decades of EDA tool ecosystem investment while addressing chiplet-specific challenges. The framework directly incorporates traditional formats including GDSII/OASIS (1970s/2000s) for physical layout, IBIS/IBIS-AMI for signal integrity, SPICE for circuit-level verification, Liberty (.LIB) for power modeling, System Verilog/Verilog-AMS for behavioral simulation,



Fig. 15. OCP-ODSA-CDX chiplet standardization framework.

and IEEE standards (1149.1, 1687, 1801, 2416) for test and power management [183], [184]. Rather than creating entirely new formats, the framework strategically extends these proven standards for chiplet applications—adapting LEF models from ASIC Place-and-Route to package-level design, applying IBIS models to die-to-die interfaces instead of traditional I/O, extending JEDEC JEP30 series for chipletspecific mechanical and electrical properties, and enhancing SPICE netlists for SiP-level LVS verification [185]. This backward-compatible approach ensures seamless integration with existing EDA workflows, minimizes industry adoption barriers, maintains multi-vendor tool support, and provides familiar formats that design teams already understand, while systematically addressing the unique thermal, mechanical, electrical, and test challenges of heterogeneous chiplet integration. The framework's success lies in its pragmatic recognition that industry transformation requires building upon established foundations rather than disrupting proven methodologies, making it more likely to achieve widespread adoption across the semiconductor ecosystem.

As chiplet-based designs become increasingly complex, involving diverse components such as chiplets, interposers, substrates, and advanced packaging technologies, the industry lacks a unified, modular hierarchical language to systematically describe component connectivity and physical properties across 2.5-D and 3-D architectures. While existing frameworks like the OCP-ODSA-CDX initiative provide comprehensive model standardization for chiplet design exchange, they primarily focus on individual chiplet models rather than system-level architectural description.

A notable research effort in this area is the 3DBlox framework introduced in 2022 [186]. 3Dblox provides a standardized design language and format that models both the key physical stacking and logical connectivity details of 3-D IC designs in a single, comprehensive way. This allows multiple electronic design automation (EDA) tools to interoperate smoothly, facilitating easier design, verification, and optimization of complex 3-D IC architectures. This initiative serves as a valuable methodological reference for standardizing data formats in 3-D ICs. IEEE create a workgroup P3537 to standardize this framework in 2024 [187].

#### C. Multiphysics Co-Simulation: The Necessity in Chiplet Era

The manufacturing of integrated chips involves the complex coupling of multiple physical processes, including thermodynamics, fluid dynamics, and electromagnetism, which present significant multi-physics challenges.

Pu et al. have made a comprehensive review of all kinds of challenges and specific problems encountered in the manufacturing process of integrated circuits [188]. As the integration and power densities of advanced integrated circuits continue to rise, heat accumulation within chips has become increasingly severe, causing substantial temperature increases that critically affect device performance, reliability, and signal integrity [189]. Moreover, ICs integrated a wide range of materials, each characterized by thermal, electrical, and mechanical constitutive parameters that are typically influenced by environmental factors such as mechanical stress, electric fields, temperature, humidity, and radiation.

Accurately modeling how these material parameters vary with changing environmental conditions remains a significant challenge in computational simulations. In terms of hot spot models, Kwon et al. established for the first time the NS-FET electrothermal model for multi-finger, multi-stack and hybrid layouts, which can be directly embedded in the BSIM-CMG standard SPICE model [190]. Cai et al., based on dynamic kinetic monte carlo (KMC) simulation and coupled the thermal-stress-electromigration mechanism, enabled the NS-FET electrothermal model to support precise multistructure modeling for the first time and embed SPICE, providing guidance for the material selection of the 1 nm node power network [191]. Pande et al. studied the bit error rate of signals and for the first time constructed an on-chip all-digital BER degradation model, so the frequency degradation trend of the entire 5 interconnect path can be extrapolated [192]. In the field of packaging manufacturing, a study proposed a model to predict the mold filling capacity and chip offset [193]. Bourjot et al. established a self-assembly alignment model to optimize the exfoliation chemistry, step height, and Cu surface integrity, ensuring zero voids and high bonding energy at the mixed bonding interface [194].

Multi-physics coupling simulation serves multiple critical analysis targets specifically in chiplet-based 2.5-D and 3-D IC architectures. Timing analysis in chiplet systems requires multi-physics coupling to address thermal-electrical interactions, voltage-temperature dependencies, and crosschiplet timing synchronization challenges. recent research demonstrates that MFIT multi-fidelity thermal modeling for 2.5-D and 3-D chiplet architectures achieves mean absolute errors of only 1.41-1.63°C while reducing execution times from days to seconds, with 93.4-100% accuracy in temperature violation detection across different system configurations [195]. Reliability and aging analysis in chiplet systems focuses on electromigration, bias temperature instability, and mechanical stress-induced aging effects, research shows that physics-informed LSTM networks for solder joint reliability predictions significantly advance lifetime estimation by capturing complex stress and thermal interactions [196]. Thermal management and mechanical stress are critical, with methods like STAMP-2.5D enabling integrated optimization to reduce overall stress and improve thermal performance by managing temperature gradients and warpage [197]. In manufacturing process optimization, multi-physics simulations predict assembly-related failures and optimize parameters; studies show that coupling thermal, mechanical, and electrical fields can accurately model process-induced warpage and defect formation, accelerating yield improvements [198]. These recent efforts collectively demonstrate that multi-physics coupling is

indispensable for advancing the reliability, performance, and manufacturability of chiplet systems.

In the design and optimization of integrated chips, achieving efficient and reliable multi-physics simulations remains a significant challenge.

#### 1) Balancing Accuracy and Efficiency

On modern computational platforms, simulation accuracy depends heavily on the robustness of physical models and the precision of numerical methods. The deviation of the optimization results from the HFSS/COMSOL recalculation is less than 7%, and the relative target error is less than 6%, which proves that the model takes into account both accuracy and efficiency [199].

#### 2) Verification and Calibration for Multi-physics

Results from multi-physics simulations must undergo rigorous verification and calibration to ensure they align consistently with experimental data. The results of multiphysics simulation must go through a four-step closed loop of conceptual model validity, program verification, operational validity, and data validity before they can be confirmed to be consistent with the experimental data [200].

Multi-physics analysis and simulation tools play a crucial role in both scientific research and engineering. Industry leaders such as Synopsys, Cadence, ANSYS, and COMSOL are at the forefront of advancing these technologies. As system integration and complexity continue to grow, multi-physics coupling techniques have attracted significant attention from academia and industry.

#### D. Chiplet Oriented Design Verification and Testing

Chiplet design verification and testing are essential to ensuring the reliability and performance of heterogeneous integrated systems. As chiplet technology proliferates, integrating multiple dies through 2.5-D or 3-D stacking technology increasingly complex verification and testing challenges.

To tackle these issues, both industry and academia have developed a variety of testing standards and methodologies. Key standards such as IEEE 1149.1, IEEE 1500, IEEE 1687, and IEEE 1838 (Fig. 16) have evolved to provide flexible and efficient testing solutions that cover board-level testing, embedded core testing, and validation of 2.5-D/3-D stacked chips. For instance, Chuang et al. proposed the E2I-TEST method, which optimizes the traditional interconnection automatic test mode generation (I-ATPG) by adding the detection capabilities for weak open circuits, weak short circuits, and coupling defects, and by introducing transition modes (such as rising/falling edge testing) and layout-aware optimization [201]. This approach achieves full coverage of complexity and subtle defects while reducing the number of test patterns and improving efficiency through adjacency-aware analysis. Moreover, the emerging P3405 standard targets testing and repair of chiplet interconnects. By defining a standardized test and repair description language, P3405 aims to automate test pattern generation and repair processes, ensuring interoperability across heterogeneous chiplet designs [202].

Traditionally, individual ICs are verified at the chip level using foundry-certified DRC, LVS, and PEX EDA tools. However, in 3-D IC designs, stacking chiplets fabricated with different technology nodes introduces complexities that



Fig. 16. Cross-section of a 3-D stack with three chiplets. IEEE Std 1838 introduced design-for-test elements in the various chiplets to form "elevators" that allow the test stimuli and their responses to reach every chiplet in the stack. Figure: courtesy of Erik Jan Marinissen, imec.

can impact performance, reliability, and yield. Conventional verification methods often struggle to address this heterogeneity. To tackle these challenges, the Calibre 3DSTACK tool was developed to enable fast and accurate assembly-level verification of 3-D IC designs [176]. Meanwhile, the ChiPICA framework can verifies chiplet physical structures by comparing scanning electron microscope (SEM) images with design layouts [203].

TSV and stacking technology testability are crucial for advancing high-performance integrated circuits. Integrating efficient online testing, dynamic test scheduling, standardized test architectures, and system-level fault-tolerant mechanisms can greatly improve the reliability and test coverage of TSVs and stacked systems. Future research will focus on balancing real-time performance with test coverage while seamlessly incorporating these techniques into existing design and manufacturing processes.

## VIII. CONCLUSION

This survey paper uses technical trend analysis, supported by a thorough review of the literature, to examine recent advances across all areas related to chiplets. We concentrated on four key fields: SoC architecture, interconnects, EDA, and advanced packaging. Over 200 papers were selected for detailed analysis.

Based on this literature review, we believe that chiplet is essential for the integrated circuit industry. Even advanced process technologies will continue to evolve, integrating chiplets can effectively address yield challenges and improve system architecture flexibility, while steadily reducing the system's TCO. However, for chiplet technology to develop sustainably and robustly over the long term, critical issues—such as the standardization,integration of interconnect protocols and the

integrity of EDA tools—must be resolved, as these factors will significantly influence the future growth of chiplet technology.

To tackle the challenges and leverage the opportunities identified in the survey, we propose the following recommendations to ensure the sustainable and healthy long-term growth of chiplet technology:

- 1) Standardization efforts: Focus on developing and adopting unified interconnect standards for chiplets to enhance interoperability and minimize ecosystem fragmentation.
- 2) *EDA tools enhancement*: Invest in building comprehensive EDA design flows specifically designed for chiplet-based systems, lowering the barrier to entry and accelerating innovation.
- 3) Widely Heterogeneous integration: Continue exploring the potential of integrating diverse chiplets (e.g., CPUs, GPUs, AI accelerators) to create highly optimized systems tailored to specific applications.
- 4) AI/ML method adoption: Explore how AI and machine learning can be used in chiplet architecture design and optimization, given their growing importance in the field.
- 5) Cost optimization: Develop strategies to reduce the manufacturing and integration costs of chiplet-based systems to make them more competitive with traditional monolithic designs.
- 6) *Industry collaboration*: Promote partnerships among semiconductor companies, research institutions, and standards organizations to accelerate the development and adoption of chiplet technology.
- 7) Application-specific adoption: Customize chiplet solutions for particular industries, such as automotive, data center and IoT, where modular approaches can offer significant benefits in flexibility and time-to-market.
- 8) Chiplet marketplace/library promotion: By developing and adopting common interfaces, protocols, and design methodologies across the industry with Chiplet marketplace/library, chiplet providers and system designers gain confidence in cross-vendor compatibility, dramatically reducing integration risks and accelerating chiplet adoption.

## REFERENCES

- [1] WIKIPEDIA.(2024). *Chiplet*. [Online]. Available: https://en.wikipedia.org/wiki/Chiplet
- [2] Computer History Museum (CHM). (2025). 'Moore's Law' Predicts the Future of Integrated Circuits. [Online]. Available: https://www.computerhistory.org/siliconengine/moores-law-predicts-the-future-of-integrated-circuits/
- [3] S. Raman, T.-H. Chang, C. L. Dohrman, and M. J. Rosker, "The DARPA COSMOS program: The convergence of InP and silicon CMOS technologies for high-performance mixed-signal," in *Proc. 22nd Int. Conf. Indium Phosph. Rel. Mater. (IPRM)*, May 2010, pp. 1–5, doi: 10.1109/ICIPRM.2010.5516241.
- [4] Semiconductor Engineering. (2017). DARPA CHIPS Program Pushes For Chiplets. [Online]. Available: https://semiengineering.com/darpachips-program-pushes-for-chiplets/
- Keysight. (2025). Chord Signaling for High-Speed Chip-to-Chip Applications. [Online]. Available: https://www.keysight.com/us/en/assets/ 7119-1126/case-studies/5992-4282.pdf
- [6] Universal Chiplet, (2025). Universal Chiplet, Interconnect Express (UCIe)TM: Building an Open Chiplet Ecosystem Interconnect Express. [Online]. Available: https://www.uciexpress.org/
- [7] OPEN Computer Project.(2025). Open Chiplet Economy. [Online].
   Available: https://www.opencompute.org/projects/open-chiplet-economy

- [8] CCITA. (Apr. 2023). Technical Requirements for Chiplet Interface Bus. [Online]. Available: https://www.ccita.center/standard/213.html
- [9] J. H. Lau, "Recent advances and trends in advanced packaging," *IEEE Trans. Compon., Packag., Manuf. Technol.*, vol. 12, no. 2, pp. 228–252, Feb. 2022
- [10] S. Chen, H. Zhang, Z. Ling, J. Zhai, and B. Yu, "The survey of 2.5D integrated architecture: An EDA perspective," in *Proc. 30th Asia South Pacific Design Autom. Conf.*, Jan. 2025, pp. 285–293.
- [11] A. Das, M. Palesi, J. Kim, and P. Pratim Pande, "Chip and package-scale interconnects for general-purpose, domain-specific, and quantum computing systems—Overview, challenges, and opportunities," *IEEE J. Emerg. Sel. Topics Circuits Syst.*, vol. 14, no. 3, pp. 354–370, Sep. 2024, doi: 10.1109/JETCAS.2024.3445829.
- [12] Z. Yang et al., "Challenges and opportunities to enable large-scale computing via heterogeneous chiplets," in *Proc. 29th Asia South Pacific Design Autom. Conf. (ASP-DAC)*, Jan. 2024, pp. 765–770, doi: 10.1109/asp-dac58780.2024.10473961.
- [13] Y. Liu, X. Li, and S. Yin, "Review of chiplet-based design: System architecture and design methodology," Sci. China Inf. Sci., vol. 67, no. 10, 2024, Art. no. 200401. [Online]. Available: https://www.sciengine.com/doi/10.1007/s11432-023-3926-8
- [14] Phil Garrou. (Jan. 2023). IFTLE 545: Chiplet Definition and Standardization. [Online]. Available: https://www.3dincites.com/2023/ 01/iftle-545-chiplet-definition-and-standardization/
- [15] H. Jin, J. Yang, Y. Liu, B. Lyu, K. Zhang, and N. Bleier, "Mozart: A chiplet ecosystem-accelerator codesign framework for composable bespoke application specific integrated circuits," 2025, arXiv:2510.08873.
- [16] TSMC. (2024). CoWoS. [Online]. Available: https://3dfabric.tsmc.com/english/dedicatedFoundry/technology/cowos.htm
- [17] AMD. (2025). AMD 3D V-CacheTM Technology. [Online]. Available: https://www.amd.com/en/products/processors/technologies/3d-v-cache.html
- [18] Y. Feng and K. Ma, "Chiplet actuary: A quantitative cost model and multi-chiplet architecture exploration," in *Proc. 59th ACM/IEEE Design Autom. Conf.*, San Francisco, CA, USA, Jul. 2022, pp. 121–126, doi: 10.1145/3489517.3530428.
- [19] G. Mone, "The chiplet revolution," Commun. ACM, vol. 67, no. 11, pp. 14–16, Nov. 2024, doi: 10.1145/3686310.
- [20] M. Wang, Y. Wang, C. Liu, and L. Zhang, "Network-on-interposer design for agile neural-network processor chip customization," in *Proc.* 58th ACM/IEEE Design Autom. Conf. (DAC), San Francisco, CA, USA, Dec. 2021, pp. 49–54, doi: 10.1109/DAC18074.2021.9586261.
- [21] B. Vinnakota et al., "The open domain-specific architecture," IEEE Micro, vol. 41, no. 1, pp. 30–36, Jan. 2021, doi: 10.1109/ MM.2020.3042383.
- [22] N. Beck, S. White, M. Paraschou, and S. Naffziger, "'Zeppelin': An SoC for multichip architectures," in *Proc. IEEE Int. Solid - State Circuits Conf. - (ISSCC)*, San Francisco, CA, USA, Feb. 2018, pp. 40–42, doi: 10.1109/ISSCC.2018.8310173.
- [23] S. Naffziger, K. Lepak, M. Paraschou, and M. Subramony, "2.2 AMD chiplet architecture for high-performance server and desktop products," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, San Francisco, CA, USA, Feb. 2020, pp. 44–45, doi: 10.1109/ISSCC19947.2020.9063103.
- [24] T. Singh et al., "2.1 Zen 2: The AMD 7 nm energy-efficient high-performance x86–64 microprocessor core," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, San Francisco, CA, USA, Feb. 2020, pp. 42–44.
- [25] B. Cohen, M. Subramony, and M. Clark, "Next generation 'Zen 5' Core," in *Proc. IEEE Hot Chips Symp.*, Aug. 2024, pp. 1–27, doi: 10.1109/HCS61935.2024.10665102.
- [26] A. Smith et al., "11.1 AMD instinctTM MI300 series modular chiplet package–HPC and AI accelerator for exa-class systems," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, San Francisco, CA, USA, Feb. 2024, pp. 490–492.
- [27] P. Mosur, "Built for the edge: The Intel Xeon 6 SoC," in *Proc. IEEE Hot Chips Symp.*, Stanford, CA, USA, Aug. 2024, pp. 1–28, doi: 10.1109/HCS61935.2024.10665220.
- [28] J. Xia, C. Cheng, X. Zhou, Y. Hu, and P. Chun, "Kunpeng 920: The first 7-nm chiplet-based 64-core ARM SoC for cloud services," *IEEE Micro*, vol. 41, no. 5, pp. 67–75, Sep. 2021, doi: 10.1109/MM.2021.3085578.
- [29] J. Yin et al., "Modular routing design for chiplet-based systems," in *Proc. ACM/IEEE 45th Annu. Int. Symp. Comput. Archit. (ISCA)*, Los Angeles, CA, USA, Jun. 2018, pp. 726–738, doi: 10.1109/ ISCA.2018.00066.

- [30] H. Sharma et al., "Florets for chiplets: Data flow-aware high-performance and energy-efficient network-on-interposer for CNN inference tasks," ACM Trans. Embedded Comput. Syst., vol. 22, no. 5s, pp. 1–21, Oct. 2023, doi: 10.1145/3608098.
- [31] T. Wang, F. Feng, S. Xiang, Q. Li, and J. Xia, "Application defined on-chip networks for heterogeneous chiplets: An implementation perspective," in *Proc. IEEE Int. Symp. High-Performance Comput. Archit.* (HPCA), Apr. 2022, pp. 1198–1210, doi: 10.1109/ HPCA53966.2022.00091.
- [32] Y. S. Shao et al., "Simba: Scaling deep-learning inference with multi-chip-module-based architecture," in *Proc. 52nd Annu. IEEE/ACM Int. Symp. Microarchitecture*, Oct. 2019, pp. 14–27, doi: 10.1145/3352460.3358302.
- [33] D. Xu, W. Zhou, Z. Huang, H. Liang, and X. Wen, "RHT\_NoC: A reconfigurable hybrid topology architecture for chiplet-based multicore system," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 33, no. 8, pp. 2104–2117, Aug. 2025.
- [34] Y. Wu et al., "Upward packet popup for deadlock freedom in modular chiplet-based systems," in *Proc. IEEE Int. Symp. High-Performance Comput. Archit. (HPCA)*, Apr. 2022, pp. 986–1000, doi: 10.1109/ HPCA53966.2022.00076.
- [35] Z. Chen, Y. Wang, and H. Zhou, "Hybrid deadlock recovery algorithm for irregular NoC in multi-chiplet systems," in *Proc. IEEE Int. Symp. Parallel Distrib. Process. Appl. (ISPA)*, Kaifeng, China, Oct. 2024, pp. 268–277.
- [36] J. Zhang et al., "IPDR: An inter-chiplet priority-driven deadlock resolution for 2-D/2.5-D multichiplet systems," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 33, no. 9, pp. 2424–2437, Sep. 2025, doi: 10.1109/TVLSI.2025.3583289.
- [37] JEDEC, (2025). Standards & Documents Search. [Online]. Available: https://www.jedec.org/standards-documents
- [38] AMD. (2022). AMD Unveils World's Most Advanced Gaming Graphics Cards, Built on Groundbreaking AMD RDNA 3 Architecture With Chiplet Design. [Online]. Available: https://ir.amd.com/news-events/press-releases/detail/1099/amd-unveils-worlds-most-advanced-gaming-graphics-cards-built-ongroundbreaking-amd-rdna-3-architecture-with-chiplet-design
- [39] ALPHAWAVE, (2025). Leading the World in High-Speed Connectivity Solutions. [Online]. Available: https://www.kisacoresearch.com/sites/ default/files/documents/alphawave semi.pdf
- [40] Numem Develops MRAM-based Chiplets, Carolyn Mathas, Los Angeles, CA, USA, Jan. 2025.
- [41] C. Silvano et al., "A survey on deep learning hardware accelerators for heterogeneous HPC platforms," ACM Comput. Surveys, vol. 57, no. 11, pp. 1–39, Jun. 2025.
- [42] A. Olgun et al., "Read disturbance in high bandwidth memory: A detailed experimental study on HBM2 DRAM chips," in *Proc. 54th Annu. IEEE/IFIP Int. Conf. Dependable Syst. Netw. (DSN)*, Jun. 2024, pp. 75–89, doi: 10.1109/DSN58291.2024.00022.
- [43] D. Das Sharma, G. Pasdast, S. Tiagaraj, and K. Aygün, "High-performance, power-efficient three-dimensional system-in-package designs with universal chiplet interconnect express," *Nature Electron.*, vol. 7, no. 3, pp. 244–254, Feb. 2024. [Online]. Available: https://www.nature.com/articles/s41928-024-01126-y
- [44] ARM Developer.(2025). *Arm Chiplet System Architecture*. [Online]. Available: https://developer.arm.com/documentation/den0145/latest
- [45] Reuse Inc.(2024). Ecosystem Collaboration Drives New AMBA Specification for Chiplets. [Online]. Available: https://www.design-reuse.com/ industryexpertblogs/55699/amba-specification-for-chiplets.html
- [46] ARM. (2025). Neoverse Compute Subsystems V3 (CSS V3). [Online]. Available: https://www.arm.com/products/neoverse-compute-subsystems/css-v3
- [47] (2024). Cadence Unweils Arm-Based System Chiplet. [Online]. Available: https://community.cadence.com/cadence\_blogs\_8/b/corporatenews/posts/cadence-unveils-arm-based-system-chiplet
- [48] ARM NEWSROOM.(2025). Alphawave Semi and Arm Accelerate Scalable Computing With CSA Chiplets. [Online]. Available: https://newsroom.arm.com/blog/alphawave-arm-scalable-compute-csachiplets
- [49] L. Chen, X. Zhang, Y. Wang, and H. Li, "The survey of chiplet-based integrated architecture: An EDA perspective," 2025, arXiv:2411.04410.
- [50] P. Vivet et al., "IntAct: A 96-core processor with six chiplets 3D-stacked on an active interposer with distributed interconnects and integrated power management," *IEEE J. Solid-State Circuits*, vol. 56, no. 1, pp. 79–97, Jan. 2021, doi: 10.1109/JSSC.2020.3036341.
- [51] V. Pano, R. Kuttappa, and B. Taskin, "3D NoCs with active interposer for multi-die systems," in Proc. 13th IEEE/ACM Int. Symp.

- *Netw.-Chip*, New York, NY, USA, Oct. 2019, pp. 1–8, doi: 10.1145/3313231.3352380.
- [52] M. Lee et al., "Automated I/O library generation for interposer-based system-in-package integration of multiple heterogeneous dies," *IEEE Trans. Compon., Packag., Manuf. Technol.*, vol. 10, no. 1, pp. 111–123, Jan. 2020, doi: 10.1109/TCPMT.2019.2953659.
- [53] H. Park et al., "Design flow for active interposer-based 2.5-D ICs and study of RISC-V architecture with secure NoC," *IEEE Trans. Compon.*, *Packag., Manuf. Technol.*, vol. 10, no. 12, pp. 2047–2062, Dec. 2020.
- [54] DARPA CHIPS Program Overview, Defense Advanced Research Projects Agency, Arlington, VA, USA, 2023.
- [55] TSMC 3DFabric White Paper, TSMC, Hsinchu, Taiwan, 2024.
- [56] D. Jarrett-Amor and T. C Carusone, "A comparison of single-ended, NRZ unidirectional signaling and single-ended, NRZ simultaneousbidirectional signaling for die-to-die links," *IEEE Micro*, vol. 45, no. 1, pp. 45–56, Jan./Feb. 2025, doi: 10.1109/MM.2024.3436008.
- [57] (2023). A Sneak Peek at Chiplet Standards. [Online]. Available: https:// www.edn.com/a-sneak-peek-at-chiplet-standards/
- [58] Implementation Agreement for a 3.2 Tb/s Co-Packaged (CPO) Module, Optical Internetworking Forum, Fremont, CA, USA, Mar. 2023.
- [59] S. Li, M.-S. Lin, W.-C. Chen, and C.-C. Tsai, "High-bandwidth chiplet interconnects for advanced packaging technologies in AI/ML applications: Challenges and solutions," *IEEE Open J. Solid-State Circuits Soc.*, vol. 4, pp. 351–364, 2024.
- [60] (Sep. 2021). OpenHBI Specification Version 1.0. [Online]. Available: https://www.opencompute.org/documents/odsa-openhbi-v1-0-spec-rc-final-1-pdf
- [61] TSMC, CCITA. (Apr. 2023). CPO Technology? Applications, Challenges, and Standard Progress. [Online]. Available: https://ccita.center/standard/221.html
- [62] AyarLabs. (Mar. 2023). In-Package Optical I/O Versus Co-Packaged Optics-Let's Get Technical!. [Online]. Available: https://ayarlabs.com/blog/in-package-optical-i-o-versus-co-packaged-optics-lets-get-technical/
- [63] NVIDIA GTC.(2025). What's Next in AI Starts Here. [Online]. Available: https://www.nvidia.com/gtc/
- [64] J. W. Poulton et al., "A 1.17-pJ/b, 25-Gb/s/pin ground-referenced single-ended serial link for Off- and on-package communication using a process- and temperature-adaptive voltage regulator," *IEEE J. Solid-State Circuits*, vol. 54, no. 1, pp. 43–54, Jan. 2019.
- [65] Y. Nishi et al., "A 0.297-pJ/bit 50.4-Gb/s/wire inverter-based short-reach simultaneous bidirectional transceiver for die-to-die interface in 5 nm CMOS," in *Proc. IEEE Symp. VLSI Technol. Circuits*, Jun. 2022, pp. 154–155.
- [66] Z. Wang et al., "A 64Gb/s/wire 10.5Tb/s/mm/Layer single-ended simultaneous bi-directional transceiver with echo and crosstalk cancellation for a die-to-die interface in 28 nm CMOS," in *Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC)*, Feb. 2025, pp. 588–590.
- [67] D. T. Melek et al., "A 0.29pJ/b 5.27Tb/s/mm UCIe advanced package link in 3nm FinFET with 2.5D CoWoS packaging," in *Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC)*, Feb. 2025, pp. 590–592.
- [68] Y.-Y. Hsu, P.-C. Kuo, C.-L. Chuang, P.-H. Chang, H.-H. Shen, and C.-F. Chiang, "A 7 nm 0.46pJ/bit 20Gbps with BER 1E-25 Die-to-Die link using minimum intrinsic auto alignment and noise-immunity encode," in *Proc. Symp. VLSI Circuits*, Jun. 2021, pp. 1–2.
- [69] A. Tajalli et al., "A 1.02-pJ/b 20.83-Gb/s/wire USR transceiver using CNRZ-5 in 16-nm FinFET," *IEEE J. Solid-State Circuits*, vol. 55, no. 4, pp. 1108–1123, Apr. 2020.
- [70] K. McCollough, S. D. Huss, J. Vandersand, R. Smith, C. Moscone, and Q. O. Farooq, "11.3 A 480Gb/s/mm 1.7pJ/b short-reach wireline transceiver using single-ended NRZ for die-to-die applications," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, vol. 64, Feb. 2021, pp. 1–3.
- [71] K. Seong et al., "A 4 nm 32Gb/s 8Tb/s/mm die-to-die chiplet using NRZ single-ended transceiver with equalization schemes and training techniques," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, San Francisco, CA, USA, Feb. 2023, pp. 114–116.
- [72] Y. Wei et al., "9.3 NVLink-C2C: A coherent off package chip-to-chip interconnect with 40Gbps/pin single-ended signaling," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2023, pp. 160–162.
- [73] J. Lee, K. Lee, J.-Y. Sim, and S.-K. Lee, "A 246-fJ/b 13.3-Tb/s/mm single-ended current-mode transceiver with crosstalk cancellation for shield-less short-reach interconnect," in *Proc. IEEE Symp. VLSI Tech*nol. Circuits (VLSI Technol. Circuits), Jun. 2024, pp. 1–2.

- [74] J. Gu et al., "A 32Gb/s 0.36pJ/bit 3nm chiplet IO using 2.5D CoWoS package with real-time and per-lane CDR and bathtub monitoring," in *Proc. IEEE Symp. VLSI Technol. Circuits (VLSI Technol. Circuits)*, Jun. 2024, pp. 1–2.
- [75] Y. Nishi et al., "A 0.190-pJ/bit 25.2-Gb/s/wire inverter-based AC-coupled transceiver for short-reach die-to-die interfaces in 5-nm CMOS," *IEEE J. Solid-State Circuits*, vol. 59, no. 4, pp. 1146–1157, Apr. 2024.
- [76] M.-S. Lin et al., "36.1 A 32Gb/s 10.5Tb/s/mm 0.6pJ/b UCIe-compliant low-latency interface in 3nm featuring matched-delay for dynamic clock gating," in *Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC)*, Feb. 2025, pp. 586–588.
- [77] Q. Liu, L. Du, and Y. Du, "A 0.90-Tb/s/in 1.29-pJ/b wireline transceiver with single-ended crosstalk cancellation coding scheme for highdensity interconnects," *IEEE J. Solid-State Circuits*, vol. 58, no. 8, pp. 2326–2336, Aug. 2023.
- [78] K. Seong et al., "13.10 A 4 nm 48Gb/s/wire single-ended NRZ parallel transceiver with offset-calibration and equalization schemes for next-generation memory interfaces and chiplets," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2024, pp. 250–252.
- [79] J.-H. Park et al., "A 68.7-fJ/b/mm 375-GB/s/mm single-ended PAM-4 interface with per-pin training sequence for the next-generation HBM controller," in *Proc. IEEE Symp. VLSI Technol. Circuits* (VLSI Technol. Circuits), Jun. 2022, pp. 150–151.
- [80] H. Park et al., "A 0.385-pJ/bit 10-Gb/s TIA-terminated Di-code transceiver with edge-delayed equalization, ECC, and mismatch calibration for HBM interfaces," in *IEEE Int. Solid-State Circuits Conf.* (ISSCC) Dig. Tech. Papers, vol. 65, Feb. 2022, pp. 1–3.
- [81] J. Lee et al., "13.4 A 48GB 16-high 1280GB/s HBM3E DRAM with all-around power TSV and a 6-phase RDQS scheme for TSV area optimization," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, San Francisco, CA, USA, Feb. 2024, pp. 238–240.
- [82] W.-X. Tang et al., "HBM package interconnection pseudo all-channel signal integrity simulation and implementation method of the synchronous current load research," *Micromachines*, vol. 16, no. 8, p. 896, Jul. 2025, doi: 10.3390/mi16080896.
- [83] D. U. Lee et al., "25.2 A 1.2 V 8Gb 8-channel 128GB/s high-bandwidth memory (HBM) stacked DRAM with effective microbump I/O test methods using 29 nm process and TSV," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, San Francisco, CA, USA, Feb. 2014, pp. 432–433.
- [84] J.-Y. Kim et al., "An energy-efficient design of TSV I/O for HBM with a data rate up to 10 Gb/s," *IEEE J. Solid-State Circuits*, vol. 58, no. 11, pp. 3242–3252, Nov. 2023.
- [85] K. Chae et al., "A 4-nm 1.15 TB/s HBM3 interface with resistor-tuned offset calibration and in situ margin detection," *IEEE J. Solid-State Circuits*, vol. 59, no. 1, pp. 231–242, Jan. 2024.
- [86] H. Kim, S. H. Choi, J. Kong, Y.-H. Gong, and S. W. Chung, "Sparrow ECC: A lightweight ECC approach for HBM refresh reduction towards energy-efficient DNN inference," in *Proc. 29th ACM/IEEE Int. Symp. Low Power Electron. Design*, New York, NY, USA, Aug. 2024, pp. 1–6.
- [87] Optical Internetworking Forum.(2024). IA Title: Common Electrical I/O (CEI). [Online]. Available: https://www.oiforum.com/wp-content/ uploads/OIF-CEI-05.2.pdf
- [88] Optical Internetworking Forum.(2024). *OIF Hot Topic Fact Sheet–Common Electrical I/O (CEI)-112*. [Online]. Available: https://www.oiforum.com/wp-content/uploads/OIF-Hot-Topic-Fact-Shee\_CEI\_FINAL.pdf
- [89] G. Gangasani et al., "A 1.6Tb/s chiplet over XSR-MCM channels using 113Gb/s PAM-4 transceiver with dynamic receiver-driven adaptation of TX-FFE and programmable roaming taps in 5 nm CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, vol. 65, Feb. 2022, pp. 122–124.
- [90] R. Shivnaraine et al., "11.2 A 26.5625-to-106.25Gb/s XSR SerDes with 1.55pJ/b efficiency in 7 nm CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, vol. 64, Feb. 01, 2021, pp. 181–183.
- [91] R. Yousry et al., "11.1 A 1.7pJ/b 112Gb/s XSR transceiver for intrapackage communication in 7 nm FinFET technology," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, vol. 64, Feb. 2021, pp. 180–182.
- [92] C. F. Poon et al., "A 1.24-pJ/b 112-Gb/s (870 Gb/s/Mm) transceiver for in-package links in 7-nm FinFET," *IEEE J. Solid-State Circuits*, vol. 57, no. 4, pp. 1199–1210, Apr. 2022.
- [93] G. Gangasani et al., "A 1.1pJ/b/lane, 1.8Tb/s chiplet over XSR-MCM channels using 113Gb/s PAM-4 transceiver with signal equalization and envelope adaptation using TX-FFE in 5 nm CMOS," in *Proc.*

- IEEE Symp. VLSI Technol. Circuits (VLSI Technol. Circuits), Jun. 2024, pp. 1–2.
- [94] A. Chowdhury et al., "A 0.9pj/b 9.8–113Gb/s XSR SerDes with 6-tap TX FFE and AC coupling RX in 3nm FinFet technology," in *Proc. IEEE Symp. VLSI Technol. Circuits (VLSI Technol. Circuits)*, Jun. 2024, pp. 1–2.
- [95] A. Biswas, "Design methodologies and automated generation of ultra high speed wireline serdes transmitters," Ph.D. dissertation, Dept. of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA, 2023.
- [96] P. Kwon, "Feedforward MLSE equalization for high speed serial links," Ph.D. dissertation, Dept. of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA, 2023.
- [97] M. Wade et al., "TeraPHY: A chiplet technology for low-power, high-bandwidth in-package optical I/O," *IEEE Micro*, vol. 40, no. 2, pp. 63–71, Mar. 2020.
- [98] S. Chen et al., "A 50Gb/s CMOS optical receiver with si-photonics PD for high-speed low-latency chiplet I/O," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 70, no. 11, pp. 4271–4282, Nov. 2023.
- [99] Q. Ma et al., "A 200Gb/s, 3.5pJ/bit monolithically integrated WDM SiPhotonic transceiver for chiplet optical I/O," in *Proc. IEEE Asian Solid-State Circuits Conf. (A-SSCC)*, Nov. 2024, pp. 1–3.
- [100] P. Bhargava et al., "A 256Gbps microring-based WDM transceiver with error-free wide temperature operation for co-packaged optical I/O chiplets," in *Proc. IEEE Symp. VLSI Technol. Circuits*, Jun. 2024, pp. 1–2.
- [101] J. Xue et al., "A 4×112 Gb/s ultra-compact polarization-insensitive silicon photonics WDM receiver with CMOS TIA for co-packaged optics and optical I/O," *J. Lightw. Technol.*, vol. 42, no. 17, pp. 6028–6035, Sep. 15, 2024.
- [102] Accelerating Innovation Through A Standard Chiplet Interface: The Advanced Interface Bus (AIB), David Kehlet, Kohler, WI, USA, Oct. 2023.
- [103] (2019). OcDSAOpenHBI Workstream Proposal.
  [Online]. Available: https://146a55aca6f00848c565-a7635525d40ac1c70300198708936b4e.ssl.cf1.rackcdn.com/images/dee03aa0c4abdc0d5c470e58121855d46694fa38.pdf
- [104] M. Parasar, N. E. Jerger, P. V. Gratz, J. S. Miguel, and T. Krishna, "SWAP: Synchronized weaving of adjacent packets for network deadlock resolution," in *Proc. 52nd Annu. IEEE/ACM Int. Symp. Microarchitecture*, New York, NY, USA, Oct. 2019, pp. 873–885, doi: 10.1145/3352460.3358255.
- [105] High Bandwidth Memory (HBM4) DRAM, document JESD270-4, JEDEC Solid State Technology Association, Apr. 2025.
- [106] (2019). Accelerating Chiplets With 112G XSR SerDes PHYs. [Online]. Available: https://semiengineering.com/accelerating-chiplets-with-112g-xsr-serdes-phys/
- [107] (2018). OIF Launches CEI-112G-XSR Project Enabling Intra-Package Interconnects. [Online]. Available: https://www.lightwaveonline.com/ optical-tech/electronics/article/16675889/oif-launches-cei-112g-xsrproject-enabling-intra-package-interconnects
- [108] (2024). OIF Unveils CEI-112G-XSR+\_PAM4 Extended Extra Short Reach Implementation Agreement, Paving the Way for Advanced Interconnectivity. [Online]. Available: https://www.oiforum.com/oif-unveils-cei-112g-xsr-pam4-extended-extra-short-reach-implementation-agreement-paving-the-way-for-advanced-interconnectivity/
- [109] (Aug. 2025). TIM for 3D-Stacked Packages: Vertical Heat Paths and Interface Considerations. [Online]. Available: https://eureka.patsnap.com/report-tim-for-3d-stacked-packages-vertical-heat-paths-and-interface-considerations
- [110] Standard for Chiplet Interface Circuit, IEEE Standard P3468, IEEE Standards Association, 2024. [Online]. Available: https://standards.ieee.org/ieee/3468/11580/0
- [111] C. Wu, "Juniper's express 5: A 28.8Tbps network routing ASIC and variations," in *Proc. Hot Chips Symp.*, Stanford, CA, USA, Aug. 2022, pp. 1–16. [Online]. Available: https://hc34.hotchips.org/assets/program/conference/day2/Network%20and%20Switches/HC2022.Juniper.ChangHong Wu.v03.pdf
- [112] M. Katari, J. Jeyaraman, I. A. Mohamed, and K. Thirunavukkarasu, "Addressing power and thermal challenges in advanced packaging for AI CPUs/GPUs: Insights into multi-die stacking technology," *Int. J. Multidisciplinary Res.*, vol. 5, no. 6, pp. 2582–2160, Nov. 2023.
- [113] K. Zhou. (Dec. 2023). High-Speed Chiplet Interface IP Developments and Challenges. AkroStar Inc., Shanghai, China. [Online]. Available: https://ccita.center/uploads/allimg/20240104/1-2401041J60T07.pdf

- [114] Technical Requirements for Chiplet Interface Bus, CESA, Beijing, China, 2023.
- [115] Implementation Agreement for the High Bandwidth Coherent Driver Modulator (HB-CDM), Opt. Internetworking Forum (OIF), Fremont, CA USA 2021
- [116] S. Ardalan et al., "Bunch of wires: An open die-to-die interface," in *Proc. IEEE Symp. High-Perform. Interconnects (HOTI)*, Piscataway, NJ, USA, Aug. 2020, pp. 9–16.
- [117] B. Banijamali, S. Ramalingam, K. Nagarajan, and R. Chaware, "Advanced reliability study of TSV interposers and interconnects for the 28 nm technology FPGA," in *Proc. IEEE 61st Electron. Compon. Technol. Conf. (ECTC)*, Lake Buena Vista, FL, USA, May 2011, pp. 285–290.
- [118] C.-H. Lee et al., "Next generation large size high interconnect density CoWoS-R package," in *Proc. IEEE 74th Electron. Compon. Technol. Conf. (ECTC)*, Denver, CO, USA, May 2024, pp. 259–263.
- [119] S.-P. Jeng and M. Liu, "Heterogeneous and chiplet integration using organic interposer (CoWoS-R)," in *IEDM Tech. Dig.*, San Francisco, CA, USA, Dec. 2022, pp. 1–4.
- [120] M. Mizutani et al., "Study for realization of the next generation high density RDL packaging for 2.5D large silicon interposer," in *Proc. IEEE 74th Electron. Compon. Technol. Conf. (ECTC)*, Denver, CO, USA, May 2024, pp. 293–298.
- [121] R. Mahajan et al., "Embedded multi-die interconnect bridge (EMIB)—A high density, high bandwidth packaging interconnect," in Proc. IEEE 66th Electron. Compon. Technol. Conf. (ECTC), Las Vegas, NV, USA, May 2016, pp. 557–565.
- [122] S. Y. Hou, C. H. Lee, T.-D. Wang, H. C. Hou, and H.-P. Hu, "Supercarrier redistribution layers to realize ultra large 2.5D wafer scale packaging by CoWoS," in *Proc. IEEE 73rd Electron. Compon. Technol. Conf. (ECTC)*, May 2023, pp. 510–514.
- [123] R. Swaminathan et al., "AMD instinct" TM MI250X accelerator enabled by elevated fanout bridge advanced packaging architecture," in Proc. IEEE Symp. VLSI Technol. Circuits, Kyoto, Japan, Jun. 2023, pp. 1–2.
- [124] K. Sikka et al., "Direct bonded heterogeneous integration (DBH1) Si bridge," in *Proc. IEEE Electron. Compon. Technol. Conf.*, San Diego, CA, USA, Jun. 2021, pp. 136–147.
- [125] W. Wudjud et al., "Advanced thermocompression bonding on high density fan-out embedded bridge technology for HPC/AI/ML applications," in *Proc. IEEE 74th Electron. Compon. Technol. Conf.* (ECTC), CO, CO, USA, May 2024, pp. 929–935.
- [126] J. Kim, "Active si interposer for 3D IC integrations," in *Proc. Int. 3D Syst. Integr. Conf. (3DIC)*, Sendai, Japan, Aug. 2015, pp. 1–3.
- [127] P. Coudrain et al., "Active interposer technology for chiplet-based advanced 3D system architectures," in *Proc. IEEE 69th Electron. Compon. Technol. Conf. (ECTC)*, Las Vegas, NV, USA, May 2019, pp. 569–578.
- [128] Semiconductor Engineering. (2024). 3.5D: The Great Compromise. [Online]. Available: https://semiengineering.com/3-5d-the-great-compromise/
- [129] C. S. Mandalapu et al., "3.5D advanced packaging enabling heterogenous integration of HPC and AI accelerators," in *Proc. IEEE 74th Electron. Compon. Technol. Conf. (ECTC)*, Denver, CO, USA, May 2024, pp. 798–802.
- [130] I. Lee et al., "Extremely large 3.5D heterogeneous integration for the next-generation packaging technology," in *Proc. IEEE 73rd Electron. Compon. Technol. Conf. (ECTC)*, Orlando, FL, USA, May 2023, pp. 893–898.
- [131] S. Lie, "Cerebras architecture deep dive: First look inside the hard-ware/software co-design for deep learning," *IEEE Micro*, vol. 43, no. 3, pp. 18–30, May 2023.
- [132] S.-R. Chun et al., "InFO\_SoW (System-on-wafer) for high performance computing," in *Proc. IEEE 70th Electron. Compon. Technol. Conf. (ECTC)*, Orlando, FL, USA, Jun. 2020, pp. 1–6.
- [133] Y.-C. Hu et al., "CoWoS architecture evolution for next generation HPC on 2.5D system in package," in *Proc. IEEE 73rd Electron. Compon. Technol. Conf. (ECTC)*, Orlando, FL, USA, May 2023, pp. 1022–1026.
- [134] L. Cao, "Advanced fanout embedded bridge packaging technology for chiplets integration," in *Proc. 18th Int. Conf. Device Packaging*, Fountain Hills, AZ, USA, Mar. 2022, p. 00718.
- [135] J. You et al., "Electrical performances of fan-out embedded bridge," in *Proc. IEEE 71st Electron. Compon. Technol. Conf. (ECTC)*, San Diego, CA, USA, Jun. 2021, pp. 2030–2034.
- [136] J. Lin et al., "Scalable chiplet package using fan-out embedded bridge," in *Proc. IEEE 70th Electron. Compon. Technol. Conf. (ECTC)*, Orlando, FL, USA, Jun. 2020, pp. 14–18.

- [137] J. Lee et al., "S-connect fan-out interposer for next gen heterogeneous integration," in *Proc. IEEE 71st Electron. Compon. Technol. Conf.* (ECTC), San Diego, CA, USA, Jun. 2021, pp. 96–100.
- [138] C. T. Chong et al., "Heterogeneous integration with embedded fine interconnect," in *Proc. IEEE 71st Electron. Compon. Technol. Conf.* (ECTC), San Diego, CA, USA, Jun. 2021, pp. 2216–2221.
- [139] C.-H. Lai, W.-J. Yin, W.-H. Lai, C.-L. Kao, C.-C. Wang, and C. Hung, "Fine-line RDL structure analysis of fan-out chip-onsubstrate platform," in *Proc. IEEE 26th Electron. Packag. Technol. Conf. (EPTC)*, Singapore, Dec. 2024, pp. 898–902, doi: 10.1109/ EPTC62800.2024.10909677.
- [140] J. H. Lau, "Recent advances and trends in Cu-Cu hybrid bonding," IEEE Trans. Compon., Packag., Manuf. Technol., vol. 13, no. 3, pp. 399-425, Mar. 2023.
- [141] C. Netzband et al., "0.5 μm pitch next generation hybrid bonding with high alignment accuracy for 3D integration," in *Proc. IEEE 73rd Electron. Compon. Technol. Conf. (ECTC)*, Orlando, FL, USA, May 2023, pp. 1100–1104.
- [142] Y.-G. Lee, M. McInerney, Y.-C. Joo, I.-S. Choi, and S. E. Kim, "Copper bonding technology in heterogeneous integration," *Electron. Mater. Lett.*, vol. 20, no. 1, pp. 1–25, Jan. 2024.
- [143] Y.-C. Huang, Y.-X. Lin, C.-K. Hsiung, T.-H. Hung, and K.-N. Chen, "Cu-based thermocompression bonding and Cu/dielectric hybrid bonding for three-dimensional integrated circuits (3D ICs) application," Nanomaterials, vol. 13, no. 17, p. 2490, Sep. 2023.
- [144] N. W. Kim, H. Choe, M. A. Shah, D.-G. Lee, and S. Hur, "High-density patterned array bonding through void-free divinyl siloxane bis-benzocyclobutene bonding process," *Polymers*, vol. 13, no. 21, p. 3633, Oct. 2021.
- [145] K. Lim et al., "Design and simulation of symmetric wafer-to-wfer bonding compesating a gravity effect," in *Proc. IEEE 70th Electron. Compon. Technol. Conf. (ECTC)*, Orlando, FL, USA, Jun. 2020, pp. 1480–1485, doi: 10.1109/ECTC32862.2020.00234.
- [146] Y. Hu et al., "Wafer-scale computing: Advancements, challenges, and future perspectives [Feature]," *IEEE Circuits Syst. Mag.*, vol. 24, no. 1, pp. 52–81, Jan. 2024.
- [147] H. Ren, K. Sahoo, T. Xiang, G. Ouyang, and S. S. Iyer, "Demonstration of a power-efficient and cost-effective power delivery architecture for heterogeneously integrated wafer-scale systems," in *Proc. IEEE 73rd Electron. Compon. Technol. Conf. (ECTC)*, Orlando, FL, USA, May 2023, pp. 1614–1621.
- [148] E. Talpes, D. Williams, and D. D. Sarma, "DOJO: The microar-chitecture of Tesla's exa-scale computer," in *Proc. IEEE Hot Chips Symp. (HCS)*, Cupertino, CA, USA, Aug. 2022, pp. 1–28, doi: 10.1109/HCS55958.2022.9895534.
- [149] P. J. Restle, A. Feldman, A. Sadasivam, and W. J. Dally, "Microprocessor at 50: The path to successful wafer-scale integration," *IEEE Micro*, vol. 41, no. 6, pp. 6–13, Dec. 2021, doi: 10.1109/ MM.2021.3114259.
- [150] G. Lauterbach, "The path to successful wafer-scale integration: The cerebras story," *IEEE Micro*, vol. 41, no. 6, pp. 52–57, Nov. 2021.
- [151] J. Zhang, S. He, Y. Liu, X. Wang, and R. Xie, "Review performance, efficiency, and cost analysis of wafer-scale AI accelerators," *Comput. Electr. Eng.*, vol. 112, Jun. 2025, Art. no. 107147.
- [152] J. SivaChandra, "Heterogeneous integration on silicon-interconnect fabric using fine-pitch interconnects (<10 μm)," Ph.D. thesis, UCLA Electron., Los Angeles, CA, USA, 2020.
- [153] S.-H. Lee, S.-J. Kim, J.-S. Lee, and S.-H. Rhi, "Thermal issues related to hybrid bonding of 3D-stacked high bandwidth memory: A comprehensive review," *Electronics*, vol. 14, no. 13, p. 2682, Jul. 2025.
- [154] Z. Wang, R. Dong, R. Ye, S. S. K. Singh, S. Wu, and C. Chen, "A review of thermal performance of 3D stacked chips," *Int. J. Heat Mass Transf.*, vol. 235, Dec. 2024, Art. no. 126212.
- [155] F. Hou et al., "Optimization design of 2.5D TSV package using thermo-electrical co-simulation method," in *Proc. IEEE 66th Electron. Compon. Technol. Conf. (ECTC)*, Las Vegas, NV, USA, May 2016, pp. 1964–1969, doi: 10.1109/ECTC.2016.165.
- [156] J.-H. Wong et al., "Warpage and RDL stress analysis in large fan-out package with multi-chiplet integration," in *Proc. IEEE 72nd Electron. Compon. Technol. Conf. (ECTC)*, San Diego, CA, USA, May 2022, pp. 1074–1079.
- [157] W. Xing, Y. Xu, C. Song, and T. Deng, "Recent advances in thermal interface materials for thermal management of high-power electronics," *Nanomaterials*, vol. 12, no. 19, p. 3365, Sep. 2022.
- [158] D. W. Lee et al., "Optimizing reflowed solder TIM (sTIMs) processes for emerging heterogeneous integrated packages," in *Proc. IEEE 72nd Electron. Compon. Technol. Conf. (ECTC)*, San Diego, CA, USA, May 2022, pp. 1228–1237.

- [159] A. Bashir et al., "A novel thermal interface material composed of vertically aligned boron nitride and graphite films for ultrahigh throughplane thermal conductivity," *Small Methods*, vol. 8, no. 12, 2024, Art. no. e2301788.
- [160] P. Nimbalkar, P. Bhaskar, M. Kathaperumal, M. Swaminathan, and R. R. Tummala, "A review of polymer dielectrics for redistribution layers in interposers and package substrates," *Polymers*, vol. 15, no. 19, p. 3895, Sep. 2023.
- [161] P. Vanna-Iampikul et al., "Glass interposer integration of logic and memory chiplets: PPA and power/signal integrity benefits," in *Proc.* 60th ACM/IEEE Design Autom. Conf. (DAC), San Francisco, CA, USA, Jul. 2023, pp. 1–6.
- [162] A. Victor and M. S. Bakir, "Enabling high-density heterogeneous integration using wafer-scale chiplet reconstitution technology," in *Proc. IMAPS Symp.*, vol. 2024, Boston, MA, USA, 2025, pp. 243–250.
- [163] H. Zhi et al., "A methodology for simulating multi-chiplet systems using open-source simulators," in *Proc. 8th Annu. ACM Int. Conf. Nanosc. Comput. Commun.*, Sep. 2021, pp. 1–6.
- [164] S. Chen et al., "Floorplet: Performance-aware floorplan framework for chiplet integration," *IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.*, vol. 43, no. 6, pp. 1638–1649, Jun. 2024, doi: 10.1109/ TCAD.2023.3347302.
- [165] J. Kim et al., "Chiplet/interposer co-design for power delivery network optimization in heterogeneous 2.5-D ICs," *IEEE Trans. Compon.*, *Packag., Manuf. Technol.*, vol. 11, no. 12, pp. 2148–2157, Dec. 2021.
- [166] J. Cai et al., "Gemini: Mapping and architecture co-exploration for large-scale DNN chiplet accelerators," in *Proc. HPCA*, 2024, pp. 156–171.
- [167] Z. Tan, H. Cai, R. Dong, and K. Ma, "NN-baton: DNN work-load orchestration and chiplet granularity exploration for multichip accelerators," in *Proc. ISCA*, Jan. 2021, pp. 1013–1026.
- [168] S. Schlag, T. Heuer, L. Gottesbüren, Y. Akhremtsev, C. Schulz, and P. Sanders, "High-quality hypergraph partitioning," ACM J. Experim. Algorithmics, vol. 27, pp. 1–39, Dec. 2022.
- [169] F. Li et al., "Chipletizer: Repartitioning SoCs for cost-effective chiplet integration," in *Proc. ASPDAC*, 2024, pp. 58–64.
- [170] Y. Duan et al., "RLPlanner: Reinforcement learning based floorplan ning for chiplets with fast thermal analysis," 2023, arXiv:2312.16895.
- [171] C. Zhang, Y. Liu, and Q. Chen, "Neural network surrogate model for junction temperature and hotspot position in 3D multi-layer high bandwidth memory (HBM) chiplets under varying thermal conditions," 2025, arXiv:2503.04049.
- [172] COMSOL News 2025: Multiphysics Simulation Magazine-Commercializing Fusion Power and Advanced Engineering Applications, COMSOL AB., Burlington, MA, USA, 2025.
- [173] C. Wang et al., "A multiscale anisotropic thermal model of chiplet heterogeneous integration system," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 32, no. 1, pp. 178–189, Jan. 2024.
- [174] J.-Z. Peng, N. Aubry, Y.-B. Li, M. Mei, Z.-H. Chen, and W.-T. Wu, "Physics-informed graph convolutional neural network for modeling geometry-adaptive steady-state natural convection," *Int. J. Heat Mass Transf.*, vol. 216, Dec. 2023, Art. no. 124593, doi: 10.1016/j.ijheatmasstransfer.2023.124593.
- [175] W. Miao, Z. Xie, C. S. Tan, and M. Rotaru, "Deep reinforcement learning-based power distribution network design optimization for multi-chiplet system," in *Proc. ECTC*, May 2024, pp. 1716–1723.
- [176] N. Hossam and J. Ferguson, "Fast, accurate assembly-level physical verification of 3DIC packages," in *Proc. IEEE Int. 3D Syst. Integr. Conf. (3DIC)*, Cork, Cork, Ireland, May 2023, pp. 1–4, doi: 10.1109/3DIC57175.2023.10155000.
- [177] TOL. (2023). Reduce 3DIC Design Complexity With Early Package Assembly Verification. [Online]. Available: https://www.techonline.com/tech-papers/reduce-3dic-design-complexity/
- [178] J. Kim et al., "Architecture, chip, and package codesign flow for interposer-based 2.5-D chiplet integration enabling heterogeneous IP reuse," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 28, no. 11, pp. 2424–2437, Nov. 2020, doi: 10.1109/TVLSI.2020.3015494.
- [179] L. Cao, C.-C. Wang, C.-Y. Huang, and H.-C. Kou, "Advanced packaging design platform for chiplets and heterogeneous integration," in *Proc. IEEE 73rd Electron. Compon. Technol. Conf. (ECTC)*, Orlando, FL, USA, May 2023, pp. 1032–1037, doi: 10.1109/ECTC51909.2023.00176.
- [180] A. B. Kelleher, "Celebrating 75 years of the transistor a look at the evolution of Moore's law innovation," in *IEDM Tech. Dig.*, San Francisco, CA, USA, Dec. 2022, pp. 1–5, doi: 10.1109/ IEDM45625.2022.10019538.

- [181] Open Compute Project Foundation & JEDEC. (2025). Open Compute Project Foundation and JEDEC Drive Open Silicon Innovation. [Online]. Available: https://www.prnewswire.com/apac/ news-releases/open-compute-project-foundation-and-jedec-drive-opensilicon-innovation-302387537.html
- [182] Open Compute Project. (2023). *Using a Markup Language in Chiplet-Based Design*. [Online]. Available: https://146a55aca6f00848c565-a7635525d40ac1c70300198708936b4e.ssl.cf1.rackcdn.com/images/3056c979358c9325da179636672f0011d07902de.pdf
- [183] CALMA. (Feb. 1987). GDSII St-ream Format Manual. [Online].
  Available: http://bitsavers.informatik.uni-stuttgart.de/pdf/calma/
  GDS II Stream Format Manual 6.0 Feb87.pdfl
- [184] Microchip. (2017). Application NoteIBIS/IBIS-AMI

  Models: Background and Usage. [Online]. Available:
  https://ww1.microchip.com/downloads/aemDocuments/
  documents/FPGA/ProductDocuments/ibis-model/
  Microsemi SmartFusion2 IGLOO2 IBIS IBISAMI Models
  Background Usage Application Note AC292
- [185] Si. (2025). Library Exchange Format and Design Exchange Format (LEF/DEF). [Online]. Available: https://si2.org/lef-def-downloads/
- [186] S. Agarwal, A. Phanishayee, and S. Venkataraman, "Blox: A modular toolkit for deep learning schedulers," in *Proc. 19th Eur. Conf. Comput. Syst.*, New York, NY, USA, Apr. 2024, pp. 1093–1109.
- [187] Standard for 3Dblox-Chiplet Connectivity and Physical Properties Description Language, PAR Approved, IEEE Standard P3537, IEEE Standards Associatio, 2024. [Online]. Available: https://standards.ieee.org/ieee/3537/11871/
- [188] B. Pu et al., "Hotspots and trends in collaborative research of numerical simulation and multiphysics in 2024: A review," Sci. Technol. Rev., vol. 43, no. 1, pp. 118–131, 2025, doi: 10.3981/j.issn.1000-7857.2024.12.01753.
- [189] B. Pu, "Design of 2.5 D interposer in high bandwidth memory and through silicon via for high speed signal," *IEEE Techrxiv*, 2020, doi: 10.36227/techrxiv.12950261.
- [190] W. Kwon, C. Yoo, and J. Jeon, "Electrothermal modeling of multinanosheet FETs with various layouts," *IEEE Trans. Electron Devices*, vol. 71, no. 4, pp. 2592–2597, Apr. 2024.
- [191] L. Cai, Y. Chen, H. Zhang, J. Lin, and W. Chen, "Insight into electromigration reliability of buried power rail with alternative metal material," *IEEE Trans. Electron Devices*, vol. 71, no. 1, pp. 418–424, Jan. 2024.
- [192] N. Pande et al., "A 16nm all-digital hardware monitor for evaluating electromigration effects in signal interconnects through bit-error-rate tracking," *IEEE Trans. Device Mater. Rel.*, vol. 22, no. 2, pp. 194–204, Jun. 2022.
- [193] B. Julien et al., "Development of compression molding process for fan-out wafer level packaging," in *Proc. IEEE 70th Electron. Compon. Technol. Conf. (ECTC)*, Jun. 2020, pp. 1965–1972.
- [194] E. Bourjot et al., "Integration and process challenges of self assembly applied to die-to-wafer hybrid bonding," in *Proc. IEEE 73rd Electron*. *Compon. Technol. Conf. (ECTC)*, May 2023, pp. 1397–1402.
- [195] L. Pfromm et al., "MFIT: Multi-fidelity thermal modeling for 2.5D and 3D multi-chiplet architectures," 2024, arXiv:2410.09188.
- [196] S. D. M. de Jong, A. G. Ghezeljehmeidan, and W. D. van Driel, "Solder joint reliability predictions using physics-informed machine learning," *Microelectron. Rel.*, vol. 172, Sep. 2025, Art. no. 115797.
- [197] V. Darshana Parekh, Z. Wyatt Hazenstab, S. Rangachar Srinivasa, K. Chakrabarty, K. Ni, and V. Narayanan, "STAMP-2.5D: Structural and thermal aware methodology for placement in 2.5D integration," 2025, arXiv:2504.21140.
- [198] W. Li et al., "Finite element analysis of 2.5D packaging processes based on multi-physics field coupling for predicting the reliability of IC components," *Microelectron. Rel.*, vol. 163, Jan. 2024, Art. no. 115530.
- [199] X. Wang, J. Su, D. Chen, D. Li, G. Li, and Y. Yang, "Efficient thermalstress coupling design of chiplet-based system with coaxial through silicon via array," *Micromachines*, vol. 14, no. 8, pp. 1–15, Aug. 2023.
- [200] R. G. Sargent, "Verification and validation of simulation models: An advanced tutorial," in *Proc. Winter Simul. Conf. (WSC)*, Dec. 2020, pp. 16–29.
- [201] P.-Y. Chuang, F. Lorenzelli, C.-W. Wu, and E. Jan Marinissen, "Generating test patterns for chiplet interconnects with optimized effectiveness and efficiency," *IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.*, vol. 44, no. 3, pp. 1155–1168, Mar. 2025.
- [202] E. J. Marinissen, A. Evans, P.-Y. Chuang, M. Keim, and A. Chandra, "New standard-under-development for chiplet interconnect test and repair: IEEE std P3405," in *Proc. IEEE Eur. Test Symp. (ETS)*, Hague, The Netherlands, May 2024, pp. 1–10.

[203] M. B. Monjil, J. Zhou, N. Varshney, N. A. Zanjani, F. Farahmandi, and M. Tehranipoor, "ChiPICA: Chiplet physical inspection certification authority for trust verification in heterogeneous integration," in *Proc. IEEE Phys. Assurance Inspection Electron.*, Farah, AL, USA, Nov. 2024, pp. 1–7.



Hongwei Liu was born in Jilin, China, in May 1984. He received the Ph.D. degree in engineering (computer architecture) from the University of Chinese Academy of Sciences, Beijing, China, in 2014. He is currently a Senior Engineer with the Institute of Computing Technology, Chinese Academy of Sciences. His research interests include high-performance processor design, heterogeneous computing, hardware security and encryption chips, and software-defined chips.



Guojun Yuan received the B.S. and M.S. degrees in microelectronics from Xidian University, China, and the Ph.D. degree in computer science from the University of Chinese Academy of Sciences (UCAS). He is currently a Senior Engineer with the Institute of Computing Technology, Chinese Academy of Sciences, China. His research interests include high-performance interconnection networks and high-speed I/O circuit design.



Yuhang Liu received the Ph.D. degree in computer science from Beihang University, Beijing, in 2013. He is currently an Associate Professor with the Institute of Computing Technology (ICT), Chinese Academy of Sciences (CAS). He has been a Post-Doctoral Researcher with the Computer Science Department, Illinois Institute of Technology (IIT), Chicago. His research interests include computer architecture and high-performance computing. He is a member of ACM.



Yuan Du (Senior Member, IEEE) received the B.S. degree from Southeast University (SEU), Nanjing, China, in 2009, and the M.S. and Ph.D. degrees from the Electrical Engineering Department, University of California, Los Angeles (UCLA), in 2012 and 2016, respectively. He was at Kneron Inc., San Diego, CA, USA, from 2016 to 2019, as a Leading Hardware Architect. Since 2019, he has been with Nanjing University, Nanjing, as an Associate Professor. His current research interests include designs of high-speed inter-chip/intra-chip interconnects and

memory-centric AI hardware accelerators. He was a recipient of the Microsoft Research Asia Young Fellow in 2008, the Southeast University Chancellor's Award in 2009, the Broadcom Young Fellow in 2015, the 2023 IEEE 15th International Conference on ASIC (ASICON 2023) Best Paper Award, and the IEEE Circuits and Systems Society Darlington Best Paper Award in 2021.



**Linji Zheng** is currently an Engineer with Wuxi Institute of Interconnect Technology, China, where he is mainly engaged in high-speed interface circuit design.



**Bo Pu** (Senior Member, IEEE) received the B.S. degree in electrical engineering from Harbin Institute of Technology, China, in 2009, and the Ph.D. degree in electronic and electrical engineering from Sungkyunkwan University, South Korea, in 2015.

From 2015 to 2020, he was a Staff Engineer with the Semiconductor HQ of Samsung Electronics, Hwaseong, South Korea. From 2020 to 2021, he was a Visiting Assistant Research Professor at the National Science Foundation I/UCRC for Electromagnetic Compatibility, Missouri University of

Science and Technology, MO, USA. In 2021, he joined the DeTooLIC Technology Company Ltd., Zhejiang, China, as the Vice President. His current research interests include the design methodology of the electronic design automation for chip-package-PCB systems. He holds more than ten patents on high-speed links and 2.5-D/3-D ICs. He also focuses on the research of high-speed integrated circuits systems up to 224 Gbps, 2.5-D Si-interposer for high bandwidth memory (HBM), and through silicon via (TSV) for 3-D ICs. He serves as the TPC Chair for IEEE WAI and ACES; the TPC Member and the Session Chair for the IEEE EMCS, APEMC 2018/2022, and ISEMC; and an Associate Editor for IEEE ACCESS. He was a recipient of the Best Student Paper Award (2011 IEEE APEMC) and the Best SIPI Symposium Paper Award (Joint 2021 IEEE EMC+SIPI and EMC Europe) as the first author, the Young Scientists Awards from the International Union of Radio Science (URSI) in 2014 and IEEE APEMC 2022, the 2020 and 2021 Distinguish Reviewer of IEEE TRANSACTIONS ON ELECTROMAGNETIC COMPATIBILITY, and the 2023 Outstanding Associate Editor from IEEE ACCESS. He received the IEEE EMC Society Technical Achievement Award in August 2022.



Pengchao Wang has been engaged in the integrated circuit industry for a long time. He joined as a Digital IC Design Engineer at Wuxi Institute of Interconnect Technology in 2022. He has participated in a variety of chip design works. He has several digital chip design invention patents. His research interests include NoC and physical layer controllers.



An Yang received the B.S. degree in electronic information science and technology from the Applied Technology College, Soochow University, Suzhou, China, in 2017, and the M.S. degree in integrated circuit engineering from Jiangnan University, Wuxi, China, in 2023.

He is currently an Engineer with Wuxi Institute of Interconnect Technology. His current research interests include integrated circuit design and memory design.



Yu Li is currently pursuing the master's degree in integrated circuit engineering with Lanzhou University. Since August 2024, he has been a Research Assistant with Wuxi Institute of Interconnect Technology, Wuxi, Jiangsu, China. His research interests include transmitter design in interconnect interfaces.



**He Sun** received the B.E. degree from the School of Microelectronics, Tianjin University, Tianjin, China, in 2022. He is currently pursuing the Ph.D. degree with the Institute of Microelectronics, Chinese Academy of Sciences, Beijing, China. His research interests include high-speed wireline transceivers.



Chengming Yu received the B.S. degree in microelectronics science and engineering from Xidian University, Xi'an, Shaanxi, China, in 2023. He is currently pursuing the M.S. degree with Lanzhou University, Lanzhou, Gansu, China.

Since August 2024, he has been a Research Assistant with Wuxi Institute of Interconnect Technology, Wuxi, Jiangsu, China. His research interests include both analog and digital approaches to digital-controlled delay lines and eye-opening monitors.



**Yongfu Li** (Senior Member, IEEE) received the B.Eng. and Ph.D. degrees from the Department of Electrical and Computing Engineering, National University of Singapore (NUS), Singapore, in 2009 and 2014, respectively.

He was a Research Engineer with NUS from 2013 to 2014. He was a Senior Engineer (2014–2016), the Principal Engineer (2016–2018), and a member of Technical Staff (2018–2019) with GLOBAL-FOUNDRIES for Design-to-Manufacturing (DFM) Computer-Aided Design (CAD) Activities. He is

currently an Associate Professor (tenured) with the Department of Micro and Nano Electronics Engineering, Shanghai Jiao Tong University, China. His research interests include analog/mixed signal circuits, biomedical signal processing, and circuit automation.



Fei Guo was born in 1995. He received the master's degree in control engineering from Harbin Engineering University in 2020. From July 2020 to March 2022, he was a Digital IC Test Engineer at the CETC Key Laboratory, Integrated Circuit Company Ltd. Since March 2022, he has been a Digital IC Design Engineer at Wuxi Institute of Interconnect Technology (IICIT). His research interests include low-power design, circuit design, and circuit testability.



Shaolin Xiang received the B.S. and M.S. degrees in microelectronics from Tsinghua University, Beijing, China, in 2016 and 2019, respectively, where he is currently pursuing the Ph.D. degree in microelectronics with the School of Integrated Circuits.

He has published papers in architecture and circuit-related journals and conferences. His research interests include machine learning, neural processing unit, architecture, micro-architecture of processors, and digital circuits and systems.



Xiaoteng Zhao (Member, IEEE) received the B.Eng. degree in integrated circuit design and integration systems from Xidian University, Xi'an, Shaanxi, China, in 2014, the M.Eng. degree in integrated circuit engineering from the University of Chinese Academy of Sciences, Beijing, China, in 2017, and the Ph.D. degree in electrical and computer engineering from the University of Macau, Macau, China, in 2021.

He was a Post-Doctoral Fellow at the State Key Laboratory of Analog and Mixed-Signal VLSI

(AMSV), University of Macau, from 2021 to 2022. He is currently a Professor with the School of Integrated Circuits, Xidian University. His research interests include integrated circuits design of wireline communication, SerDes, die-to-die interconnection, clock generation and recovery, and analog and mixed signal circuits.



**Xuqiang Zheng** (Member, IEEE) received the B.S. and M.S. degrees from the School of Physics and Electronics, Central South University, Hunan, China, in 2006 and 2009, respectively, and the Ph.D. degree from the University of Lincoln, Lincoln, U.K., in 2018.

From 2010 to 2015, he was a Mixed Signal Engineer at the Institute of Microelectronics, Tsinghua University, Beijing, China. Since 2018, he has been with the Institute of Microelectronics, Chinese Academy of Sciences, Beijing, where he is currently a Professor.



**Qinfen Hao** (Senior Member, IEEE) received the Ph.D. degree in system architecture from the Institute of Computing Technology, Chinese Academy of Sciences, in 2001.

He is currently a Professor with the University of the Chinese Academy of Sciences (CAS) and the Director of the Interconnect Technology Laboratory, Institute of Computing Technology, CAS. He has pioneered the first teraflops high-performance computers, the first 32-WAY high-end SMP servers, the first cache coherence interconnect chips in SMP

servers, and the first ARM processor in China. He has published more than 50 research articles with more than 100 granted patents. He won the Second Prize of Chinese National Science and Technology Progress Award twice. He served as the General Chair for China Interconnect Technology and Industry Conference and China Chiplet Developers Conference. He is leading the Chiplet Interface Circuit Standard Development in China and also chairing the IEEE P3468 Chiplet Interface Circuit Standard Working Group.