Page 45 - Chip Scale Review_January February_2022-digital
P. 45
Heterogeneous integration for AI applications: status
and future needs (part 1)
By Madhavan Swaminathan, Siddharth Ravichandran [Georgia Institute of Technology]
T he semiconductor industry computation such as quantum computing. architectures, but at low energy and power
has been driven by Moore’s
Law [1] for over five decades. We focus on heterogeneity using advanced levels as compared to multi-core CPUs. In
addition, the performance of neural network
packaging in the context of artificial
With the number of transistors on a chip intelligence (AI) as one path forward to computations is limited by insufficient
doubling every two years, this has led to continue Moore’s Law in this article. memory bandwidth and latency.
almost exponential performance increase AI is gaining momentum in data science Rather than integrate all logic functions
for microprocessors while making these as a means for solving difficult problems using a single process through monolithic
chips affordable. The performance increase that are otherwise unsolvable. The AI (or homogeneous) integration as in system
from one microprocessor generation to algorithms being developed by the computer on chip (SoC), there is a trend towards
the next was supported through metal- science community to support such solutions polylithic (or heterogeneous) integration
oxide semiconductor field-effect transistor rely on neural network architectures for for microprocessors. This is being driven
(MOSFET) scaling proposed by Robert training and deriving inferences. Over by the exponential costs associated with
Dennard [2], which enabled area reduction the last several years architectures based large dies implemented using advanced
while maintaining constant power densities. on feedforward neural networks (FFNN), process nodes, the reduced time to market
Until the mid-2000s, this trend continued convolutional neural networks (CNN), possible using smaller dies from optimized
until transistor leakage became a major recurrent neural networks (RNN), and technology nodes, and the move towards
problem because of thinning of the gate others have emerged that rely on several heterogeneous semiconductor systems
oxide to a few atomic layers. This has led to layers of neurons interconnected through with dies connected from different
reduced frequency and reduced single thread dense connectivity to address complex process nodes. Such connectivity is being
performance scaling for microprocessors problems arising in science, computer enabled by two fundamental technologies
since the mid-2000s. vision, finance, robotics, and others. These namely, 2D interposers and 3D stacking.
Several innovations over the last fifteen computational architectures need to be The interposer connects dies together
years related to materials, transistor mapped to computer hardware so that the laterally using high-density wiring and
structure and architecture have enabled neural networks can be suitably trained to resides between the dies and a package
continued area scaling to continue Moore’s derive inferences from data. substrate, which can then be mounted
Law. One such innovation is the use of As the complexity of data to be learned onto a printed wiring board (PWB). As
multiple cores supported by software increases, the number of hidden neurons compared to interposers, 3D stacking
parallelism to increase performance. in a neural network increase converting supports much higher wiring density and
Today, microprocessors in data center neural networks into deep neural networks shorter wires, but with drawbacks related
applications contain more than a hundred (DNN), making the hardware required to power delivery and heat removal. With
cores with ten billion transistors and this for computations even more complex. AI applications being memory intensive,
trend of increasing core count appears to be Unlike traditional microprocessor-based highly parallelizable, and requiring
continuing. Unfortunately, due to leakage, computing platforms, AI algorithms require memory to be placed near the logic for
the power densities of microprocessors significantly increased computations reducing latency, there is a natural fit for
have increased since the mid-2000s. A and storage needs, thereby limiting using both interposers and 3D stacking
combination of prohibitive costs associated the performance gained from general to maximize perfor mance. As AI
with chip fabrication in advanced nodes, purpose central processing units (CPUs). applications evolve, we expect the resulting
reduced area scaling, increased power An alternative is the use of graphics system architectures to require extreme
densities and the general feeling that processing unit (GPU)-based platforms heterogeneity further justifying the need
Moore’s Law is slowing down, is causing that provide better performance than CPUs for a heterogeneous integration platform
the semiconductor industry to pursue non- for neural network-based computations. enabled by advanced packaging.
traditional approaches to transistor scaling However, GPUs consume high power In this paper we provide a survey and
and computing. John Shalf [3] posits that the and are not very energy efficient, thereby comparison of the various 2D, and 3D
path forward is along three fronts, namely: limiting their capability and applicability technologies available and in development
1) Relying on more efficient architectures for many AI applications. Because based on the present and future needs posed
supported by advanced packaging, 2) neural network processing is highly by AI. This comparison is based on a set
Developing new materials and devices parallelizable, exploitation of both data- of metrics derived from data speed, energy
that enable non-traditional transistors, and thread-level parallelism is required efficiency and latency that have a direct
and 3) Innovation using new models for in the implementation of such computing impact on system performance.
43
Chip Scale Review January • February • 2022 [ChipScaleReview.com] 43