Page 12 - ChipScale_Nov-Dec

Page 12 - ChipScale_Nov-Dec_2020-digital

P. 12

Enabling AI with heterogeneous integration

By Arvind Kumar, Mukta Farooq [IBM Research]
A rtificial intelligence (AI) speech recognition.

applications have become
a pervasive segment of the Finally, there has been
a spectacular growth in
computing landscape and are poised computing capability
to continue explosive growth for many f u ele d by s c a l i ng ,
years. Delivering the very large demands which has led to the
of compute, memory, and bandwidth widespread availability
required by AI has become a leading of computing resources
challenge in computing system design facilitated by the cloud.
and provided a major incentive for This third factor is the Figure 1: Example of a deep neural network (DNN).
deployment of specialized components one on which we will
t o a c c e l e r a t e t h e s e wo r k l o a d s . focus because its continued success is based on the difference, or error, between
Heterogeneous integration [1] has risen critical to sustaining the unrelenting the calculated output and the correct
to the forefront of technology focus growth of AI, making clear the need output during a subsequent backward
because of the need to enable high for new paradigms to address the pass through the network. Propagation
interconnectivity between these diverse diminishing benefits of scaling. from one layer to the next involves a set of
components coupled with the need for a Before discussing the potential computationally-intensive steps, of which
new technology paradigm to counter the benefits of heterogeneous integration, matrix multiplication of the layer’s input
diminishing returns of scaling. In this we first summarize some fundamentals by the weights of that layer dominates
article, we address three fundamental of AI [2]. The engine of AI computing the computation time [3]. This procedure,
questions framing the challenges in this is the deep neural network (DNN). As called backpropagation, has to be repeated
area: 1) What are the compute, memory, shown in Figure 1, a DNN consists many times over the full set of training
and connectivity requirements for AI of a series of layers that transform an examples, until the accuracy on a separate
workloads? 2) What novel heterogeneous input (e.g., the pixels of an image) to an verification set saturates, representing the
integration technologies are being output (e.g., the classification of what is achievable accuracy of the network. Once
developed to deliver continued gains in the image) by discovering the most the model is trained, it can be deployed in a
in AI system performance? 3) What important features (e.g., the distinct phase called inference. Unlike the training
is the path forward to deploy these characteristics distinguishing a cat from phase, which involves forward, backward,
heterogeneous integration technologies a dog). A large DNN may consist of tens and update passes through the network
to address these challenges? We conclude or even hundreds of layers (hence the for each training example, inference
by describing the requirements of the name “deep”). These layers consist of involves only the forward pass. Moreover,
heterogeneous integration platform matrices with parameters called weights it is often possible to downsize the model
needed to enable an upward trajectory for that transform the input of the layer to an after training while preserving accuracy,
AI system performance. output, which feeds into the next layer. reducing the computational burden even
Typical DNNs have tens of millions of further.
Background weight parameters, but the number of
To understand the compute, memory, parameters in very large models (e.g., Computational requirements
and connectivity requirements for AI for language translation) is approaching We now discuss the compute, memory,
workloads, we start with some history. the trillion level. DNNs have found and connectivity requirements for
The revolution in AI computing has widespread application in many domains, training and inference. Several key
been the product of three factors. First, with speech, language, and vision drivers have emerged to accelerate
voluminous amounts of data are being comprising three of the most common. A I w o r k l o a d c o m pu t a t i o n [ 3] .
collected across many domains (social First, specialized accelerators have
media and data from sensors are two Training and inference phases of architect ures to speed up matrix
example sources), and AI algorithms a DNN multiplication as a dominant operation
are adept at analyzing this unstructured Before a DNN can be used, it must first of DNN computation. Second, unlike
data. Second, following a period of many be trained, typically with a very large many other workloads, the memory
decades of little progress, there have been number of examples, in order to find the access patterns and order of instructions
many recent advances in AI algorithms weight values. Each training example is are completely deterministic (set by the
to gain insights into this data, attaining sent forward through the network, and DNN), so that specialized accelerators
human-level accuracy at tasks such as the weights of the network are adjusted with dataflow architectures can achieve
very high compute utilization. Rather

10
10 Chip Scale Review November • December • 2020 [ChipScaleReview.com]

7 8 9 10 11 12 13 14 15 16 17