Page 18 - ChipScale_Nov-Dec_2020-digital
P. 18

different nodes for cost and technology   is much easier, but the heat from the   the very costly matrix multiplication
        optimization; and 2) die disaggregation:   accelerator must escape upward through   operation, which dominates DN N
        allowing higher yielding and lower   the memor y, posing a signif icant   computation, can be done in a single
        cost smaller dies that can be flexibly   thermal challenge.           step exactly where the weights are
        combined. The most common example    In the long term, heterogeneous   stored, without having to move them
        of  heterogeneous  integration  in  AI   integration can help facilitate two other   from memory to the processor. Such
        today is the use of the Si interposer   possibilities related to analog-based   accelerators,  with  different  process
        for connectivity between a GPU or AI   elements. In the first of these, the analog   requirements because of the analog
        accelerator and 3D-stacked HBM. A   elements  are  used  as  high-density   elements, are likely to be fabricated
        noteworthy research example of die   memory elements, enabling much higher   in a technology different from other
        disaggregation for an AI application is   capacity than for a dynamic random   digital-based components. As shown in
        NVIDIA’s inference engine consisting   access memory (DRAM) [15]. This   Figure 9, they will require high-
        of 36 small chiplets [14].         is particularly valuable for inference   bandwidth connections to digital
          In the short term, several alternatives   applications as model sizes increase   accelerators that can offload operations
        to Si interposers, including bridges,   and off-chip DRAM accesses of model   to them [17].
        FOWLP solutions, and high-density   weights are highly energy inefficient.
        organic  substrates  are emerging,   Heterogeneous integration can provide   Summary
        offering better scalability and lower   high-bandwidth connections of these   We summarize the requirements
        cost. As bandwidth demands grow,   emerging memories to AI accelerators   of the heterogeneous integration
        however,  these  t wo - di mensional   through the 2D and 3D high-density   platform needed to enable an upward
        solutions will be supplemented with   interconnect solutions above.   trajectory for AI system performance in
        more and more 3D solutions. Although   In the second possibility, analog-  Figure 10. Tiling of multi-core chiplets
        3D solutions are state of the art for low-  based elements can also be used for   provides the high compute density
        power memory stacking thanks to the   “in-memory computing” by storing the   needed for the strenuous demands of
        easily managed thermal concerns, 3D   weight values of a DNN in a crosspoint   AI computing. Specialized components
        stacking of accelerators and memories   array of resistors  [16]. The  major   and accelerators, including those
        has been hindered by thermal and   advantage of such a configuration is that   with 3D stacking, are used for further
        power delivery challenges. We view
        this challenge, however, as a vital next
        step to achieve order of magnitude
        improvements in bandwidth and energy/
        bit (Table 1).
          An important consideration while
        stacking an accelerator and a memory
        die is to determine stacking order. The
        top die will be a full thickness die,
        while the bottom die will be thinned
        and contain TSVs to supply power and
        signals to the top die. Both accelerator-
        on-memory and memory-on-accelerator
        3D stacking present significant but
        different challenges, as illustrated in   Figure 8: Possible configurations for 3D stacking of accelerator and memory.
        Figure 8. For accelerator-on-memory,
        the power delivery to the accelerator
        using TSVs in the memory can be
        challenging, although the cooling of
        the high-performance accelerator is









        Table 1: Comparison of 2D and 3D stacking
        solutions with respect to bandwidth and energy/bit.

        made easier by its proximity to the top
        heat sink. For memory-on-accelerator,
        the power delivery to the accelerator
                                           Figure 9: Analog-based computational memory connected to digital processors and memory (from [17]).

        16   Chip Scale Review   November  •  December  •  2020   [ChipScaleReview.com]
        16
   13   14   15   16   17   18   19   20   21   22   23