Page 48 - Chip Scale Review_January February_2022-digital
P. 48

data size is smaller than the weight
        parameters, the data communication latency
        and energy can be reduced. In [7], it has been
        shown that for AlexNet, a CNN implemented
        using a FeFET-based PIM accelerator, the
        multi-chip design leads to lower energy
        efficiency and latency as compared to a
        single-chip design. In addition, the multi-
        chip design provides higher throughput
        because of pipelined execution.
          In both NMP and PIM accelerator
        architectures, the DRAM, logic, and other
        dies come from different process nodes
        and are connected using 2D (interposer
        and package substrate) or 3D (stacking)
        leading to heterogeneous integration. As AI
        architectures become more prevalent there   Figure 3: Interconnect density and wiring length.
        will be a need for integrating additional   where L and S are the linewidth and   equal and therefore, an average wirelength,
        dies from other domains (analog, radio   spacing, respectively. This leads to an   AWL, can be calculated as:
        frequency (RF), and photonics), as well as   interconnect density per layer:
        different process nodes leading to extreme
        heterogeneity as shown in Figure 2d.                                                              Eq. 3
                                                                  Eq. 2
                                                                              where DS is the die-to-die spacing, and
        Comparison metrics                 (In part 2 of this article we will present Table   K is the keep-out zone (KOZ). Most
          With 2D and 3D solutions available   2; the data in that table will show that the   advanced packages have DS=100μm and
        for connecting dies together, metrics are   10X higher interconnect density for silicon   K=50μm, and therefore, n and P are the
        required to compare these technologies.   (250 IO/mm/layer) as compared to organic   main parameters that affect AWL. As
        In this section we describe five important   interposer (25 IO/mm/layer) results in the use   more columns in each die are routed,
        metrics related to interconnect density,   of fewer layers in the former. For example,   AWL will increase, but compensation
        energy per bit, data rate, power delivery,   7,500 wires can be routed along a die edge   for the increased AWL can be achieved
        and thermal design power (TDP) for   of length 10mm using just three wiring   by using smaller pad pitch, P. For 3D
        comparing the various options.     layers with a silicon interposer as compared   stacking, the wirelength is the physical
          Interconnection density. As shown   to many more layers in an organic package.   length that connects dies together, which
        in Figure 2, the implementation of NMP   Because vias add additional parasitic losses   includes the length of the through-silicon
        and PIM DNN architectures requires   and latency, increased layers will reduce   via (TSV). All wires have the same
        connectivity between adjacent dies. When   performance. Layer count can be further   length in 3D integration. (In part 2 of
        dies on a package substrate or interposer are   reduced by decreasing the assembly pitch.   this article, we will present Tables 2 and
        connected, the number of die input/output   For 3D stacking, the dies are connected   3 that show that some of the wirelengths
        (I/O or IO) terminals that can escape along   vertically, and therefore, W=(1⁄P) . For   are estimated based on Eq. 3, and others
                                                                       2
        the die edge to connect to an adjacent die   a non-staggered pitch of P=10μm, this   are based on published data.)
        becomes an important metric. Because the   translates to an interconnection density of   Data rate. Dies communicate with each
        number of interconnects that can be routed   10,000 IO/mm .)          other using a driver-receiver pair through
                                                      2
        depends on the length of the die edge and   Interconnect length. The length of the   the interconnection. With short wires
        number of redistribution or wiring layers   wire connecting adjacent dies determines   that are a few mm long, simple driver
        (RDL), a better metric is the interconnect   the total resistance and capacitance of the   and receiver circuitry can be used, as
        density with units of IO/mm/layer [8].   interconnections, and therefore, represents   compared to long interconnections where
          Two  dies connected  together  are   an important design parameter to consider.   equalization and error correction schemes
        shown in Figure 3 where the circular   From Figure 3, not all wire lengths are   are required, as shown in Figure 4a [9]. In
        pads of diameter D represent the
        positions where the dies are assembled
        using solder or other means. The center-
        to-center pad pitch is P with the pads
        staggered as shown in the figure, like
        pad arrangements in HBM. The number
        of interconnects that can be wired
        between the two dies over a distance. P y
        is given by:

                             Eq. 1

                                           Figure 4: a) Driver-interconnect-receiver communication, and b) Eye diagram at the far end of the center line.

        46
        46   Chip Scale Review   January  •  February  •  2022   [ChipScaleReview.com]
   43   44   45   46   47   48   49   50   51   52   53