Page 45 - Chip Scale Review_January February_2022-digital
P. 45

Heterogeneous integration for AI applications: status


        and future needs (part 1)


        By Madhavan Swaminathan, Siddharth Ravichandran  [Georgia Institute of Technology]
        T        he semiconductor industry   computation such as quantum computing.   architectures, but at low energy and power




                 has been driven by Moore’s
                 Law [1] for over five decades.   We focus on heterogeneity using advanced   levels as compared to multi-core CPUs. In
                                                                              addition, the performance of neural network
                                           packaging in the context of artificial
        With the number of transistors on a chip   intelligence (AI) as one path forward to   computations is limited by insufficient
        doubling every two years, this has led to   continue Moore’s Law in this article.  memory bandwidth and latency.
        almost exponential performance increase   AI is gaining momentum in data science   Rather than integrate all logic functions
        for microprocessors while making these   as a means for solving difficult problems   using a single process through monolithic
        chips affordable. The performance increase   that are otherwise unsolvable. The AI   (or homogeneous) integration as in system
        from one microprocessor generation to   algorithms being developed by the computer   on chip (SoC), there is a trend towards
        the next was supported through metal-  science community to support such solutions   polylithic (or heterogeneous) integration
        oxide semiconductor field-effect transistor   rely on neural network architectures for   for microprocessors. This is being driven
        (MOSFET) scaling proposed by Robert   training and deriving inferences. Over   by the exponential costs associated with
        Dennard [2], which enabled area reduction   the last several years architectures based   large dies implemented using advanced
        while maintaining constant power densities.   on feedforward neural networks (FFNN),   process nodes, the reduced time to market
        Until the mid-2000s, this trend continued   convolutional neural networks (CNN),   possible using smaller dies from optimized
        until transistor leakage became a major   recurrent neural networks (RNN), and   technology nodes, and the move towards
        problem because of thinning of the gate   others have emerged that rely on several   heterogeneous semiconductor systems
        oxide to a few atomic layers. This has led to   layers of neurons interconnected through   with dies connected from different
        reduced frequency and reduced single thread   dense connectivity to address complex   process nodes. Such connectivity is being
        performance scaling for microprocessors   problems arising in science, computer   enabled by two fundamental technologies
        since the mid-2000s.               vision, finance, robotics, and others. These   namely, 2D interposers and 3D stacking.
          Several innovations over the last fifteen   computational architectures need to be   The interposer connects dies together
        years related to materials, transistor   mapped to computer hardware so that the   laterally using high-density wiring and
        structure and architecture have enabled   neural networks can be suitably trained to   resides between the dies and a package
        continued area scaling to continue Moore’s   derive inferences from data.  substrate, which can then be mounted
        Law. One such innovation is the use of   As the complexity of data to be learned   onto a printed wiring board (PWB). As
        multiple cores supported by software   increases, the number of hidden neurons   compared to interposers, 3D stacking
        parallelism to increase performance.   in a neural network increase converting   supports much higher wiring density and
        Today, microprocessors in data center   neural networks into deep neural networks   shorter wires, but with drawbacks related
        applications contain more than a hundred   (DNN), making the hardware required   to power delivery and heat removal. With
        cores with ten billion transistors and this   for computations even more complex.   AI applications being memory intensive,
        trend of increasing core count appears to be   Unlike traditional microprocessor-based   highly parallelizable, and requiring
        continuing. Unfortunately, due to leakage,   computing platforms, AI algorithms require   memory to be placed near the logic for
        the power densities of microprocessors   significantly increased computations   reducing latency, there is a natural fit for
        have increased since the mid-2000s. A   and storage needs, thereby limiting   using both interposers and 3D stacking
        combination of prohibitive costs associated   the performance gained from general   to maximize perfor mance. As AI
        with chip fabrication in advanced nodes,   purpose central processing units (CPUs).   applications evolve, we expect the resulting
        reduced area scaling, increased power   An alternative is the use of graphics   system architectures to require extreme
        densities and the general feeling that   processing unit (GPU)-based platforms   heterogeneity further justifying the need
        Moore’s Law is slowing down, is causing   that provide better performance than CPUs   for a heterogeneous integration platform
        the semiconductor industry to pursue non-  for neural network-based computations.   enabled by advanced packaging.
        traditional approaches to transistor scaling   However, GPUs consume high power   In this paper we provide a survey and
        and computing. John Shalf [3] posits that the   and are not very energy efficient, thereby   comparison of the various 2D, and 3D
        path forward is along three fronts, namely:   limiting their capability and applicability   technologies available and in development
        1) Relying on more efficient architectures   for many AI applications. Because   based on the present and future needs posed
        supported by advanced packaging, 2)   neural network processing is highly   by AI. This comparison is based on a set
        Developing new materials and devices   parallelizable, exploitation of both data-   of metrics derived from data speed, energy
        that enable non-traditional transistors,   and thread-level parallelism is required   efficiency and latency that have a direct
        and 3) Innovation using new models for   in the implementation of such computing   impact on system performance.


                                                                                                             43
                                                          Chip Scale Review   January  •  February  •  2022   [ChipScaleReview.com]  43
   40   41   42   43   44   45   46   47   48   49   50