Page 36 - ChipScale_May-June

Page 36 - ChipScale_May-June_2020-digital

P. 36

AI’s impact on 3D packaging: heterogeneous integration

By Santosh Kumar [Yole Développement, Korea]
A rtificial intelligence (AI) application specific integrated circuits inference is embedded. Inference products

has been in development
(FPGAs), etc.
for more than fifty years but (ASICs), and field-programmable gate arrays could integrate an accelerator onto a SoC.
Inference is typically conducted at the
recently it has emerged as one of the key Deep learning is made up of two phases: application or client endpoint (i.e., edge),
drivers of semiconductor growth fueled by training and inference. What is called rather than on the server or cloud. It requires
smartphones, personal assistants, social training in AI, is training a virtual machine fewer hardware resources, and depending
media and smart automotive. AI requires to recognize objects and sounds. The on the application, can be performed using
various computing hardware and high-end training phase requires huge computing CPUs, FPGAs, ASICs, DSPs, etc. Inference
memories; and because of requirements of power and can be extremely long (hours, is expected to shift locally to mobile devices.
high bandwidth, low latency and low power days, months) depending on the required Here, precision can be sacrificed in favor of
consumption, AI has created opportunities precision. Currently, most of the training greater speed or less power consumption.
for the advanced packaging business. is done in the cloud where the computing As mentioned before, the key computing
capabilities are in-line with such operations. hardware for training and interference
AI technology trends Nevertheless, some training can still be of AI include CPUs, GPUs, FPGAs and
AI is now widespread and has become done on edge. An example would be for face ASICs. CPUs offer a great degree of
an integral part of the technology industry. detection systems on phones where once-off programmability, however, they tend to
Whenever a machine mimics human training of a couple of seconds is required provide less performance power than
cognitive function, we can say it is AI. In the to complete the neural network model to optimized and dedicated hardware chips.
AI field, some people begin to distinguish recognize the face of the phone’s owner. FPGAs are extremely flexible and have
between the types of machine learning. Inference can’t happen without training. excellent performance, making them ideal
Machine learning is the subset of AI that Inference can occur on edge, and will give for specialized applications that need a small
includes abstruse statistical techniques similar prediction accuracy, but simplified, volume of reprogrammable microchips.
that enable machines to improve task compressed and optimized for runtime That said, FPGAs are quite difficult to
performance with experience. The first performance. Inference can also occur in create and expensive as well, not to mention
goal of machine learning is to give the the cloud. The act of using the trained neural that they still falter in terms of power and
machine the ability to learn without being network with new data on a device or server performance when compared to the likes
programmed. The next goal allows the to identify something is known as inference. of GPUs and ASICs. GPUs are ideal for
machine to assess the data collected and System on chips (SoCs) with GPUs and a graphics, as well as their underlying matrix
make predictions. Besides academic research CPU inside are used to do this computation operations and scientific algorithms, as
and military programs, there are machine on edge (on a phone for example). Inference they are super fast and flexible. With an
learning flagship applications aimed at requires less computational capabilities than ASIC, you get the best of all worlds as it is
consumers. The most important applications training, as this was already performed in basically a customizable chip that can be
are voice identification and language the cloud. designed to accomplish a very specific task
processing used for an intelligent personal at high power, efficiency, and performance.
assistant (e.g., Siri, Cortana, Alexa, etc.) and Hardware for AI ASICs are now increasingly being developed
image recognition for autonomous driving. Training and inference have two different for the purpose of supporting artificial
There are several algorithmic approaches missions and that makes the hardware intelligence AI and associated technologies.
that enable enhancement and acceleration of requirements different. Training requires Google tensor processing units (TPUs) are
machine learning—deep learning is one of intensive calculations, consequently large a series of ASICs designed for machine
them and it is gaining more and more interest. bandwidth, using CPU, GPU, FPGA learning, and optimized to run open source
Deep learning is the subset of machine and dynamic random access memories machine learning software. Baidu developed
learning comprising algorithms that allow (DRAMs), and it is the first and main user dedicated ASICs for its “Kunlun” AI
software to train itself to perform tasks, like of 3D interconnected devices. Inference accelerator for data centers.
speech and image recognition, by exposing (inference processing) workload looks like High-bandwidth memory (HBM) is
multilayered neural networks to vast amounts the processing of digital signal processing an ideal memory solution for AI training
of data. These new ways of processing (DSP) algorithms. Inference can take place hardware. HBM2E is the latest version of
heavy data, like video and photo, were made in two places, or in a datacenter, or locally HBM—its specification was announced
possible because of the availability of efficient (embedded inference), such as in a car. The by JEDEC in 2018 to support increased
data computing hardware, such as new large requirements for inference include low bandwidth and capacity. Samsung
bandwidth memories, general processor latency, less expensive and much lower announced the industry’s first HBM2E
units (GPUs), central processor units (CPUs), power consumption, especially when memory, “Flashbolt,” in March 2019, which

34
34 Chip Scale Review May • June • 2020 [ChipScaleReview.com]

31 32 33 34 35 36 37 38 39 40 41