Page 36 - ChipScale_May-June_2020-digital
P. 36

AI’s impact on 3D packaging: heterogeneous integration




        By Santosh Kumar  [Yole Développement, Korea]
        A         rtificial intelligence (AI)   application specific integrated circuits   inference is embedded. Inference products




                  has been in development
                                           (FPGAs), etc.
                  for more than fifty years but   (ASICs), and field-programmable gate arrays   could integrate an accelerator onto a SoC.
                                                                              Inference is typically conducted at the
        recently it has emerged as one of the key   Deep learning is made up of two phases:   application or client endpoint (i.e., edge),
        drivers of semiconductor growth fueled by   training and inference. What is called   rather than on the server or cloud. It requires
        smartphones, personal assistants, social   training in AI, is training a virtual machine   fewer hardware resources, and depending
        media and smart automotive. AI requires   to recognize objects and sounds. The   on the application, can be performed using
        various computing hardware and high-end   training phase requires huge computing   CPUs, FPGAs, ASICs, DSPs, etc. Inference
        memories; and because of requirements of   power and can be extremely long (hours,   is expected to shift locally to mobile devices.
        high bandwidth, low latency and low power   days, months) depending on the required   Here, precision can be sacrificed in favor of
        consumption, AI has created opportunities   precision. Currently, most of the training   greater speed or less power consumption.
        for the advanced packaging business.  is done in the cloud where the computing   As mentioned before, the key computing
                                           capabilities are in-line with such operations.   hardware for training and interference
        AI technology trends               Nevertheless, some training can still be   of AI include CPUs, GPUs, FPGAs and
          AI is now widespread and has become   done on edge. An example would be for face   ASICs. CPUs offer a great degree of
        an integral part of the technology industry.   detection systems on phones where once-off   programmability, however, they tend to
        Whenever a machine mimics human    training of a couple of seconds is required   provide less performance power than
        cognitive function, we can say it is AI. In the   to complete the neural network model to   optimized and dedicated hardware chips.
        AI field, some people begin to distinguish   recognize the face of the phone’s owner.   FPGAs are extremely flexible and have
        between the types of machine learning.   Inference can’t happen without training.   excellent performance, making them ideal
        Machine learning is the subset of AI that   Inference can occur on edge, and will give   for specialized applications that need a small
        includes abstruse statistical techniques   similar prediction accuracy, but simplified,   volume of reprogrammable microchips.
        that enable machines to improve task   compressed and optimized for runtime   That said, FPGAs are quite difficult to
        performance with experience. The first   performance. Inference can also occur in   create and expensive as well, not to mention
        goal of machine learning is to give the   the cloud. The act of using the trained neural   that they still falter in terms of power and
        machine the ability to learn without being   network with new data on a device or server   performance when compared to the likes
        programmed. The next goal allows the   to identify something is known as inference.   of GPUs and ASICs. GPUs are ideal for
        machine to assess the data collected and   System on chips (SoCs) with GPUs and a   graphics, as well as their underlying matrix
        make predictions. Besides academic research   CPU inside are used to do this computation   operations and scientific algorithms, as
        and military programs, there are machine   on edge (on a phone for example). Inference   they are super fast and flexible. With an
        learning flagship applications aimed at   requires less computational capabilities than   ASIC, you get the best of all worlds as it is
        consumers. The most important applications   training, as this was already performed in   basically a customizable chip that can be
        are voice identification and language   the cloud.                    designed to accomplish a very specific task
        processing used for an intelligent personal                           at high power, efficiency, and performance.
        assistant (e.g., Siri, Cortana, Alexa, etc.) and   Hardware for AI    ASICs are now increasingly being developed
        image recognition for autonomous driving.  Training and inference have two different   for the purpose of supporting artificial
          There are several algorithmic approaches   missions and that makes the hardware   intelligence AI and associated technologies.
        that enable enhancement and acceleration of   requirements different. Training requires   Google tensor processing units (TPUs) are
        machine learning—deep learning is one of   intensive calculations, consequently large   a series of ASICs designed for machine
        them and it is gaining more and more interest.   bandwidth, using CPU, GPU, FPGA   learning, and optimized to run open source
        Deep learning is the subset of machine   and dynamic random access memories   machine learning software. Baidu developed
        learning comprising algorithms that allow   (DRAMs), and it is the first and main user   dedicated ASICs for its “Kunlun” AI
        software to train itself to perform tasks, like   of 3D interconnected devices. Inference   accelerator for data centers.
        speech and image recognition, by exposing   (inference processing) workload looks like   High-bandwidth memory (HBM) is
        multilayered neural networks to vast amounts   the processing of digital signal processing   an ideal memory solution for AI training
        of data. These new ways of processing   (DSP) algorithms. Inference can take place   hardware. HBM2E is the latest version of
        heavy data, like video and photo, were made   in two places, or in a datacenter, or locally   HBM—its specification was announced
        possible because of the availability of efficient   (embedded inference), such as in a car. The   by JEDEC in 2018 to support increased
        data computing hardware, such as new large   requirements for inference include low   bandwidth  and capacity. Samsung
        bandwidth memories, general processor   latency, less expensive and much lower   announced the industry’s first HBM2E
        units (GPUs), central processor units (CPUs),   power consumption, especially when   memory, “Flashbolt,” in March 2019, which


        34
        34   Chip Scale Review   May  •  June  •  2020   [ChipScaleReview.com]
   31   32   33   34   35   36   37   38   39   40   41