Back

 Industry News Details

 
Designing AI systems: Fundamentals of AI software and hardware Posted on : Aug 05 - 2022

Artificial intelligence is already solving problems in all aspects of our lives, from animation filmmaking and tackling space exploration , to fast food recommendation systems that improve ordering efficiency. These real-world AI systems examples are just the beginning of what is possible in an AI Everywhere future and they are already testing the limits of compute power. Tomorrow’s AI system solutions will require optimization up and down the stack from hardware to software, including in the tools and frameworks used to implement end-to-end AI and data science pipelines.

AI math operations need powerful hardware

A simple example can help illustrate the root of the challenge. In Architecture All Access: Artificial Intelligence Part 1 – Fundamentals, Andres Rodriguez, Intel Fellow and AI Architect, shows how a simple deep neural network (DNN) to identify digits from handwritten numbers requires over 100,000 weight parameters for just the first layer of multiplications. This is for a simple DNN that processes 28×28 black-and-white images.

Today’s AI solutions process image frames of 1280×720 and higher, with separate channels for red, green and blue. And the neural networks are more complex, as they must identify and track multiple objects from frame-to-frame, or extract meaning from different arrangements of words that may or may not affect that meaning. We are already seeing models surpassing trillions of parameters that require multiple weeks to train. Tomorrow’s AI solutions will be even more complex, combining multiple types of models and data.

AI application development is an iterative process, so speeding up compute-intensive tasks can increase a developer’s ability to explore more options or just get their job done more quickly. As Andres explains in the video above, matrix multiplications are often the bulk of the compute load during the training process.

In Architecture All Access: Artificial Intelligence Part 2 – Hardware, Andres compares the different capabilities of CPUs, GPUs and various specialized architectures. The AI-specific devices, and many new GPUs, have systolic arrays of multiply-accumulates (MACs) that can parallelize the matrix multiplications inherent to the training process.

Size and complexity of AI systems require hardware heterogeneity

As neural networks become more complex and dynamic — for instance those with a directed acyclic graph (DAG) structure — they limit the ability to parallelize these computations. And their irregular memory access patterns require low-latency memory access. CPUs can be a good fit for these requirements due to their flexibility and higher operating frequencies.

With increasing network size, even larger amounts of data need to be moved between compute and memory. Given the growth of MACs available in hardware devices, memory bandwidth and the bandwidth between nodes within a server and across servers are becoming the limiting factors for performance. View more