Decoding the Hype of AI Chips – Tamal Dutta Chowdhury

Recent advancements in artificial intelligence have fueled considerable excitement around what many call “AI Chips”, specialized hardware tailored to optimize and accelerate AI workloads. Amidst the noise, many companies appear to be branding nearly every processor enhancement or hardware upgrade as an AI chip, blurring the lines between genuine AI advancements and general performance improvements.

In distinguishing true innovation from hype, it is crucial to consider two pivotal questions.

Are these really ‘AI’ chips?
How essential are AI chips for most AI applications?

The term “AI chips” broadly encompasses various types of specialized hardware designed for complex AI tasks, such as AI-specific ASICs (Application-Specific Integrated Circuits) and FPGAs (Field-Programmable Gate Arrays), GPUs (Graphics Processing Units), and TPUs (Tensor Processing Units).

Can GPUs be considered AI chips?

Initially developed for graphics design, GPUs (Graphics Processing Units) have been around for several decades and are used mostly in gaming systems. (To learn more about the internals of GPUs, read this previous article of mine.) The application of GPUs in AI is a relatively recent phenomenon—only after the advent of Deep Learning. So, conventional GPUs can hardly be called AI chips.

In recent years, the GPU architecture has evolved significantly to accommodate the needs of modern AI workloads. Here are a few examples:

Memory-bandwidth innovations, such as High-bandwidth Memory (HBM) and Graphics Double Data Rate 6 (GDDR6) enable faster data transfer within GPUs, thus driving efficiencies in multi-GPU systems.
Mixed-precision computing (e.g., FP16 and INT8) allows for faster computation while maintaining accuracy, which is particularly useful for training large models.
NVIDIA’s tensor cores accelerate matrix multiplications, thereby enabling standard GPUs to perform the high-throughput calculations needed for deep learning.

Apart from hardware-focused innovations, the software ecosystem has also evolved. Perhaps the most prominent one is NVIDIA’s CUDA ecosystem, including cuDNN (CUDA Deep Neural Network Library). CUDA is a parallel computing platform that helps developers leverage GPUs for non-graphics tasks, including AI workloads.

Not everybody may agree that such GPU-driven innovations imply that these are indeed AI chips. Although tensor cores align with the concept of AI chips to a great extent, other advancements remain general-purpose, supporting applications beyond AI, such as graphics rendering and scientific computing. While GPU manufacturers and their ecosystem (e.g., the large infrastructure players) may assert that they are developing and deploying AI chips, hardware engineers outside this ecosystem may view these claims with skepticism.

Google’s Tensor Processing Units (TPUs)

While GPUs were originally created for graphics processing and subsequently adapted for AI workloads, TPUs were created solely for machine learning, especially deep learning. TPUs primarily focus on high throughput for tensor operations (the backbone of neural networks) rather than general parallel processing. They use a simplified, matrix-based architecture that operates by maximizing computational density and minimizing control logic. At the core of TPUs lies the Matrix Multiplication Unit based on systolic arrays, which parallelizes a massive number of matrix multiplications.

Similar to GPUs, TPUs also have high-bandwidth interconnects for handling large datasets, and facilitating fast data flow across tensor cores. However, TPUs have the additional capability of moving large data matrices directly to/from tensor cores without the need for additional memory hierarchies like GPUs. This makes TPUs truly AI-focused, offering significant improvements in performance per watt as compared to GPUs. The TPU architecture also supports multi-chip scalability, thereby enabling large-scale distributed training that is imperative in critical use cases like LLM development.

AI-centered ASICs & FPGAs

Application-specific Integrated Circuits (ASICs) are custom chips designed for a single function, such as cryptography, machine learning, or networking. AI-based ASICs are tailored for executing large-scale AI workloads—e.g., tensor processing—quickly and efficiently. ASICs are extremely efficient at executing the specific tasks they are designed for. As a result, they perform faster, consume less power, and occupy less physical space compared to GPUs, and even standard TPUs. On the flip side, ASICs lack flexibility – since they are hardwired for very specific tasks, they cannot be repurposed once manufactured.

AI-based ASICs are often the preferred choice for autonomous systems (e.g., robots), edge/IoT AI (e.g., Google Edge TPU & Mythic AI chips), and high-throughput inferencing in cloud data centers (e.g., Amazon Inferentia & Nervana NNP.) They typically include specialized memory architectures to minimize data movement. As they incorporate on-chip memory close to the processing units, they avoid the bottlenecks associated with accessing off-chip memory, which is particularly beneficial for memory-intensive AI tasks. Some ASICs even incorporate custom memory hierarchies to store frequently used data near the processing elements, further enhancing efficiency.

Field Programmable Gate Arrays (FPGAs) are reconfigurable chips that can be programmed for specific tasks, making them highly adaptable to various AI workloads. FPGAs consist of an array of logic blocks and reconfigurable interconnects that can form customized circuits for specific tasks. This makes them uniquely versatile compared to fixed-architecture GPUs and TPUs.

FPGAs enable the creation of custom data flow architectures optimized for specific AI models. They also allow developers to configure the bit width of arithmetic units precisely, thereby enabling low-precision operations that improve speed and reduce power consumption. One of its strengths is the ability to implement systolic arrays, which are specialized for performing matrix multiplications. FPGAs also offer customizable on-chip memory arrangements, allowing developers to adjust the memory structure to meet specific AI model requirements. This is particularly beneficial for large models, or those requiring high memory bandwidth.

AI-focused FPGAs may offer lower performance than high-end GPUs and TPUs during model development and training but often exhibit lower latency for real-time inferencing. This is because they can be configured to execute tasks with minimal intermediary steps and custom datapaths. Moreover, they are usually more energy-efficient, thus making them suitable for edge AI applications where flexibility and power efficiency are crucial. Examples of the adoption of FPGAs in AI include Intel Stratix and Altera, Microsoft Project Brainwave, and Xilinx Versal AI Core.

Other AI Chips & Specialized Hardware

While NVIDIA has become synonymous with GPU computing, particularly with A100 & H100, other major companies are also making significant strides in AI chip innovation. Here are a few examples.

Amazon has introduced custom AI chips to optimize machine learning workloads, such as Inferentia (inference tasks) and Trainium (for large-scale AI training).
Apple’s Neural Engine introduces AI capabilities into its A-series and M-series chips.
Google has developed AI-focused hardware, such as the Axion CPU (an ARM-based processor for enhancing AI performance), in addition to the TPUs mentioned earlier.
Meta’s MTIA (Meta Training and Inference Accelerator) is custom-designed for efficient AI inference tasks efficiently.
Microsoft has released Maia AI Accelerator (for training and inference of LLMs) and Cobalt CPU (an ARM-based processor optimized for AI workloads).

Additionally, relatively newer players building AI-focused ASICs, such as Groq’s LPU (Language Processing Unit), are creating their own space. It is important to note that many specialized chips are primarily designed for use within certain infrastructures (e.g., TPUs for Google Cloud Platform.) Moreover, their adoption may require applications to be specifically tailored, often creating a high learning curve for developers.

Not every company implementing AI solutions needs AI chips.

The allure of AI chips is often driven by the marketing push of technology companies, but the reality is that most companies can achieve their AI goals without this specialized hardware. Most real-world AI solutions can be built and implemented with low-range GPU systems, or even multi-CPU infrastructure. AI chips, or even high-end GPU systems, may not be necessary.

Specialized hardware, such as AI chips, is generally required for complex problems, or scenarios requiring real-time (or extremely low-latency) processing – e.g., autonomous systems, high-frequency trading, and LLM development. However, most real-world applications, such as chatbots, forecasting, or predictive modeling, do not need such high-powered, specialized hardware. The high costs of the AI chips may simply not be worth it.

The decision to invest in AI chips, whether on-premise or in the Cloud, should be based on specific workload demands, infrastructure capabilities, and the technology strategy of the organization. AI chips are costly to acquire and operate, both in terms of initial investment and ongoing maintenance. Cloud AI platforms built on AI chips do remove the need for initial investments and maintenance, but the accumulated subscription costs may still be very high. Given their considerable investments in such specialized hardware, the Cloud AI providers will naturally aim to recoup their expenses, often weaving these costs into pricing models in ways that may not be immediately transparent to clients.

Closing Comments

The hype surrounding AI chips is not without merit. With the continued focus on building real-world AI applications, the demand for AI chips will only intensify. At the same time, it is important to understand that AI chips are neither a panacea for solving complex problems nor needed for every AI use case. While they indeed hold transformative potential, they are not a universal solution to all computational challenges. As this technology advances, it’s important to maintain a balanced perspective, recognizing both the impressive capabilities, and the current limitations of these chips. This well-informed approach is key to enabling us to harness the benefits effectively while navigating the complexities of this groundbreaking but still evolving technology.

PS: 10 – 20% of this paper was written with the help of generative AI.