Page 19 - ChipScale_Nov-Dec_2020-digital
P. 19

3.   B. Fleischer, et al., “A scalable multi-TeraOPS deep
                                                                 learning processor core for AI training and inference,”
                                                                 2018 IEEE Symp. on VLSI Circuits, 2018.
                                                               4.   X. Sun, J. Choi, C-Y. Chen, et al., “Hybrid 8-bit floating
                                                                 point (HFP8) training and inference for deep neural
                                                                 networks,” NeurIPS 2019.
                                                               5.   J. Choi, S. Venkataramani, V. Srinivasan, et al.,
                                                                 “Accurate and efficient 2-bit quantized neural networks,”
                                                                 Proc. of Machine Learning and Systems 1, 2019.
                                                               6.  H. Jun, et al., “HBM (high-bandwidth memory) DRAM
                                                                 technology and architecture,” 2017 IEEE Inter. Memory
                                                                 Workshop (IMW), 2017.
                                                               7.  Wen-mei Hwu, et al., “Rebooting the data access
                                                                 hierarchy of computing systems,” 2017 IEEE Inter. Conf.
                                                                 on Rebooting Computing (ICRC), IEEE, 2017.
                                             8.  A. Kumar, et al., “System performance: from enterprise to AI,” 2018 IEEE
        Figure 10: Heterogeneous integration platform.
                                                Inter. Electron Devices Meeting (IEDM).
        acceleration. Central to this concept   9.  A. Li, et al., “Evaluating modern GPU interconnect: Pcie, nvlink, nv-sli,
        are highly efficient interfaces to enable   nvswitch and gpudirect,” IEEE Trans. on Parallel and Distributed Sys., 31.1
        connectivity between components.        (2019): 94-110.
        These interfaces should have high   10. K. Oi, et al., “Development of new 2.5D package with novel integrated
        bandwidth (Gbps/mm), low power (pJ/     organic  interposer  substrate  with  ultra-fine wiring  and  high  density
        bit), high area efficiency (Gbps/mm ),   bumps,” 2014 IEEE 64th Elec. Comp. and Tech. Conf. (ECTC), Orlando,
                                       2
        and be based on an open standard to     FL, pp. 348-353.
        allow connectivity between a wide   11.  J. H. Lau, “Redistribution-layers (RDLS) for fan-out panel-level packaging,”
        variety of components. Through such a   2018 Inter. Wafer Level Packaging Conf. (IWLPC), 2018.
        platform, facilitated by advancements   12. M. G. Farooq, et al., “3D copper TSV integration, testing and reliability,”
        in the heterogeneous integration        2011 International Electron Devices Meeting (IEDM), IEEE, 2011.
        technologies described earlier, we   13.  M. G. Farooq, et al., “Impact of 3D copper TSV integration on 32SOI FEOL
        believe  that the challenges  of the    and BEOL reliability,” 2015 IEEE Inter. Rel. Phys. Symp., 2015.
        AI  revolution  in  computing  can  be   14.  R. Venkatesan, et al., “A 0.11 PJ/OP, 0.32-128 Tops, Scalable Multi-Chip-
        successfully overcome.                  Module-Based Deep Neural Network Accelerator Designed with A High-
                                                Productivity vlsi Methodology,” 2019 IEEE Hot Chips 31 Symposium (HCS),
        Acknowledgments                         IEEE Computer Soc.
          The authors wish to thank J. Burns,   15.  M. Donato, et al., “On-chip deep neural network storage with multi-level
        R. Divakaruni, D. McHerron, and S.      eNVM,” Proc. of the 55th Annual Design Automation Conf., 2018.
        Sikorski for their guidance in the drafting   16.  G. W. Burr, et al., “Large-scale neural networks implemented with non-
        of this manuscript, and M. Le Gallo for   volatile memory as the synaptic weight element: Comparative performance
        permission to use a figure.             analysis (accuracy, speed, and power),” 2015 IEEE Inter. Elec. Devices
        References                              Meeting (IEDM).
          1.   https://eps.ieee.org/technology/  17.  M. Le Gallo, et al., “Mixed-precision in-memory computing,” Nature
             h e t er oge n e ou s -i nt eg r a t io n -  Electronics, 1.4 (2018): 246-253.
             roadmap.html
          2.   V. Sze, Y. Chen, T. Yang, J. S.
             Emer, “Efficient processing of
             deep neural networks: a tutorial
             and su r vey,” in Proc. of the
             IEEE, vol. 105, no. 12, pp. 2295-
             2329, Dec. 2017, doi: 10.1109/
             JPROC.2017.2761740.

                       Biographies
                         Arvind Kumar is a manager of AI Hardware Technologies at IBM Research, Yorktown Heights, NY.
                       He holds SB, SM, and PhD degrees in Electrical Engineering and Computer Science, all from MIT;
                       email arvkumar@us.ibm.com
                         Mukta Farooq is a Distinguished Research Staff Member at IBM Research, Yorktown Heights, NY, and an
                       IEEE Fellow. She received her BS from IIT-Bombay, MS from Northwestern U., and PhD from Rensselaer
                       Polytechnic Institute.


                                                                                                             17
                                                       Chip Scale Review   November  •  December  •  2020   [ChipScaleReview.com]  17
   14   15   16   17   18   19   20   21   22   23   24