Page 19 - ChipScale_Nov-Dec_2020-digital
P. 19
3. B. Fleischer, et al., “A scalable multi-TeraOPS deep
learning processor core for AI training and inference,”
2018 IEEE Symp. on VLSI Circuits, 2018.
4. X. Sun, J. Choi, C-Y. Chen, et al., “Hybrid 8-bit floating
point (HFP8) training and inference for deep neural
networks,” NeurIPS 2019.
5. J. Choi, S. Venkataramani, V. Srinivasan, et al.,
“Accurate and efficient 2-bit quantized neural networks,”
Proc. of Machine Learning and Systems 1, 2019.
6. H. Jun, et al., “HBM (high-bandwidth memory) DRAM
technology and architecture,” 2017 IEEE Inter. Memory
Workshop (IMW), 2017.
7. Wen-mei Hwu, et al., “Rebooting the data access
hierarchy of computing systems,” 2017 IEEE Inter. Conf.
on Rebooting Computing (ICRC), IEEE, 2017.
8. A. Kumar, et al., “System performance: from enterprise to AI,” 2018 IEEE
Figure 10: Heterogeneous integration platform.
Inter. Electron Devices Meeting (IEDM).
acceleration. Central to this concept 9. A. Li, et al., “Evaluating modern GPU interconnect: Pcie, nvlink, nv-sli,
are highly efficient interfaces to enable nvswitch and gpudirect,” IEEE Trans. on Parallel and Distributed Sys., 31.1
connectivity between components. (2019): 94-110.
These interfaces should have high 10. K. Oi, et al., “Development of new 2.5D package with novel integrated
bandwidth (Gbps/mm), low power (pJ/ organic interposer substrate with ultra-fine wiring and high density
bit), high area efficiency (Gbps/mm ), bumps,” 2014 IEEE 64th Elec. Comp. and Tech. Conf. (ECTC), Orlando,
2
and be based on an open standard to FL, pp. 348-353.
allow connectivity between a wide 11. J. H. Lau, “Redistribution-layers (RDLS) for fan-out panel-level packaging,”
variety of components. Through such a 2018 Inter. Wafer Level Packaging Conf. (IWLPC), 2018.
platform, facilitated by advancements 12. M. G. Farooq, et al., “3D copper TSV integration, testing and reliability,”
in the heterogeneous integration 2011 International Electron Devices Meeting (IEDM), IEEE, 2011.
technologies described earlier, we 13. M. G. Farooq, et al., “Impact of 3D copper TSV integration on 32SOI FEOL
believe that the challenges of the and BEOL reliability,” 2015 IEEE Inter. Rel. Phys. Symp., 2015.
AI revolution in computing can be 14. R. Venkatesan, et al., “A 0.11 PJ/OP, 0.32-128 Tops, Scalable Multi-Chip-
successfully overcome. Module-Based Deep Neural Network Accelerator Designed with A High-
Productivity vlsi Methodology,” 2019 IEEE Hot Chips 31 Symposium (HCS),
Acknowledgments IEEE Computer Soc.
The authors wish to thank J. Burns, 15. M. Donato, et al., “On-chip deep neural network storage with multi-level
R. Divakaruni, D. McHerron, and S. eNVM,” Proc. of the 55th Annual Design Automation Conf., 2018.
Sikorski for their guidance in the drafting 16. G. W. Burr, et al., “Large-scale neural networks implemented with non-
of this manuscript, and M. Le Gallo for volatile memory as the synaptic weight element: Comparative performance
permission to use a figure. analysis (accuracy, speed, and power),” 2015 IEEE Inter. Elec. Devices
References Meeting (IEDM).
1. https://eps.ieee.org/technology/ 17. M. Le Gallo, et al., “Mixed-precision in-memory computing,” Nature
h e t er oge n e ou s -i nt eg r a t io n - Electronics, 1.4 (2018): 246-253.
roadmap.html
2. V. Sze, Y. Chen, T. Yang, J. S.
Emer, “Efficient processing of
deep neural networks: a tutorial
and su r vey,” in Proc. of the
IEEE, vol. 105, no. 12, pp. 2295-
2329, Dec. 2017, doi: 10.1109/
JPROC.2017.2761740.
Biographies
Arvind Kumar is a manager of AI Hardware Technologies at IBM Research, Yorktown Heights, NY.
He holds SB, SM, and PhD degrees in Electrical Engineering and Computer Science, all from MIT;
email arvkumar@us.ibm.com
Mukta Farooq is a Distinguished Research Staff Member at IBM Research, Yorktown Heights, NY, and an
IEEE Fellow. She received her BS from IIT-Bombay, MS from Northwestern U., and PhD from Rensselaer
Polytechnic Institute.
17
Chip Scale Review November • December • 2020 [ChipScaleReview.com] 17