The Race Is On For The AI Chip

Artificial Intelligence (‘AI’) is finally taking off, helped by big data cloud computing, and breakthroughs in neural networks (computer code that emulates large networks of very simple interconnected units, a bit like neurons in the brain) and deep learning (how we sharpen AI by structuring neural networks in multiple processing layers.)

It has been greatly boosted by the use of graphic processors (‘GPU’) which are much better able to crunch massive amounts of data required for ‘training,’ the improvement of the algorithms that drive AI.

Nvidia (NASDAQ:NVDA) has been one of the biggest beneficiaries of this trend. Its GPUs are increasingly used for the data-heavy training part of improving the algorithms that drive AI applications. Its take-up has been nothing short of spectacular.

In the latest quarter (Q1) revenues from datacenter grew a whopping 186% over Q1 2016 (after growing 205% yoy in Q4 2016). While this isn’t the only driver behind the enorm ous rally in the shares (other segments like gaming and automotive are also growing at a brisk clip), it’s the most powerful:

This growth is likely going to continue as AI is accelerating, but this is such a lucrative market that it is no wonder that competition is emerging:

Google (NASDAQ:GOOG) (NASDAQ:GOOGL) has devised the Tensor Processing Unit (TPU) especially for AI. It just recently launched the second edition of the TPU. Intel (NASDAQ:INTC) is also working on a specialized AI chip the Nervana, based on technology from the company with the same name Intel acquired a few years back. It’s slated for the end of the year. IBM (NYSE:IBM) is developing ‘neuromorphic’ chip called True North that mirrors the design of neural networks, but it’s still years away from the market. Some of ARM’s upcoming designs include ways to deal with machine learning like a new architecture called Dynamiq Qualcomm (NASDAQ:QCOM) has started building chips specifically for executing neural networks Apple (NASDAQ:AAPL) is working on its own AI chip, the Apple Neural Engine as discussed by SA contributor Mark Hibben Groq, a startup by some of the people who worked on Google’s TPU, is also developing an AI chip.

This emerging competition isn’t necessarily a serious threat yet. Some of these chips are actually meant for inference on mobile devices, so they do not compete directly with Nvidia’s chips.

However, there are two considerations that are relevant here. First, GPUs were not developed for AI, from Wired:

But GPUs-originally design for other purposes-are far from ideal. “They just happen to be what we have,” says Sam Altman, president of the tech accelerator Y Combinator and co-chairman of open-source AI lab OpenAI.

With their massive parallel processing capabilities, they do perform much better than CPUs, but there could be space for improvement through specifically designed chips for neural networks and deep learning. A sign of this is that quite a number of companies are embarking on this, they seem to think so.

However, one must also realize that GPUs are now an established solution, here is LeCun from Facebook (NASDAQ:FB) (from Wired):

Coders and companies are now so familiar with GPUs, he says, and they have all the tools needed to use them. “[GPUs] are going to be very hard to unseat,” he says, “because you need an entire ecosystem.

But before Nvidia shareholders rejoice, they might also want to read the next bit directly after:

But he also believes that a new breed of AI chips will significantly change the way the big internet companies execute neural networks, both in the data center and on consumer devices – everything from phones to smart lawn mowers and vacuum cleaners.

At this stage, it’s near impossible to predict how this is going to play out. Many of the chips are still in development and of the ones that exist, their performance isn’t easily comparable and they could very well be horses for slightly different courses.

One might want to keep in mind that Nvidia’s GPUs didn’t replace Intel’s CPUs in the data center, they work with them. There are many different archit ectures possible, depending on the tasks that the system is designed to do.

So we think that for the near future, Nvidia’s relentless assault on the server market is going to endure, although perhaps not quite with the same speed.

However, despite first mover advantage and considerable switching costs, several big players are deeming it worthwhile to develop their own specialist AI chips.

And we have proof that at least one of them, Google, is actually using them (their own TPUs), which is one indication that competition is emerging for Nvidia. Here is what Google itself argues (from Wired):

Google says that in rolling out its TPU chip, it saved the cost of building about 15 extra data centers

And this is how IBMs neuromorphic chip (“resembling the brain of a small rodent”) could differentiate itself (versus Nvidia GPUs), from Wired:

What does a neuro-synaptic architecture give us? It lets us do things like image classification at a very, very low power consumption,” says Brian Van Essen, a computer scientist at the Lawrence Livermore National Laboratory who’s exploring how deep learning could be applied to national security. “It lets us tackle new problems in new enviro nments.”

Both these indications suggest that specialist AI chips might have a leg up in terms of energy consumption.

Intel’s Nervana chip is almost certainly a FPGA chip (a field-programmable gate array, basically a chip that is programmable after manufacturing). These have been around (not only from Nervana but also from Xilinx, for example). How do they stack up against GPUs? From Nextplatform:

The tested Intel Stratix 10 FPGA outperforms the GPU when using pruned or compact data types versus full 32 bit floating point data (FP32). In addition to performance, FPGAs are powerful because they are adaptable and make it easy to implement changes by reusing an existing chip which lets a team go from an idea to prototype in six months-versus 18 months to build an ASIC.”


While FPGAs provide superior energy efficiency (Performance/Watt) compared to high-end GPUs, they are not known for offering top peak floating-point performance. FPGA techn ology is advancing rapidly. The upcoming Intel Stratix 10 FPGA offers more than 5,000 hardened floating-point units (DSPs), over 28MB of on-chip RAMs (M20Ks), integration with high-bandwidth memories (up to 4x250GB/s/stack or 1TB/s), and improved frequency from the new HyperFlex technology… The Intel Stratix 10, based on 14nm Intel technology, has a peak of 9.2 TFLOP/s in FP32 throughput. In comparison, the latest Titan X Pascal GPU offers 11TFLOPs in FP32 throughput.

So slightly behind in computing power but more flexible and energy efficient, but keep in mind this is no more than indicative. Reading that Nextplatform article you will realize there are many more issues involved, but the flexibility of FPGAs could be a key advantage over GPUs; here is Nervana founder and now Intel director Navene Rao:

The chip is also designed to work not just with one type of deep neural networks, but with many. “We can boil neural networks down to a very small number of primitives, and even within those primitives, there are only a couple that matter,” Rao says, meaning that just a few fundamental hardware ideas can drive a wide range of deep learning services.

Nvidia’s response

Of course, Nvidia itself isn’t sitting still, it has the Xavier System on Chip, integrating CPU, CUDA GPU and deep learning accelerators for the forthcoming Drive PX3 (autonomous driving). Xavier has upgraded its GPU core from Pascal to CUDA Volta, signif icantly reducing energy costs (just 20W).

Then it has the upcoming (autumn 2017) Volta GPUs, apart from the fact that they are much more powerful than their predecessor (Pascal), these are more geared toward AI, from Medium:

Although Pascal has performed well in deep learning, Volta is far superior because it unifies CUDA Cores and Tensor Cores. Tensor Cores are a breakthrough technology designed to speed up AI workloads. The Volta Tensor Cores can generate 12 times more throughput than Pascal, allowing the Tesla V100 to deliver 120 teraflops (a measure of GPU power) of deep learning performance… The new Volta-powered DGX-1 leapfrogs its previous version with significant advances in TFLOPS (170 to 960), CUDA cores (28,672 to 40,960), Tensor Cores (0 to 5120), NVLink vs PCIe speed-up (5X to 10X), and deep learning training speed (1X to 3X).

So they’re hardly sitting still. It helps a good deal that Nvidia is a big player in the autonomous driving market , which is also AI-driven and requires very large amount of computing power. So it has an added incentive to adapt its generic GPUs to the diverse demands of AI applications.

Comparing to Google’s TPU

Google’s TPU is the only chip that has already emerged. While the first TPU could only do inference, the recent second iteration is also capable of training. There are no independent benchmarks, but it nevertheless looks like Google might be onto something, from Techcrunch:

How fast are these new chips? “To put this into perspective, our new large-scale translation model takes a full day to train on 32 of the world’s best commercially available GPU’s-while one 1/8th of a TPU pod can do the job in an afternoon,” Google wrote in a statement.

(We believe 1/8th of a TPU pod equals 8 TPUs.) This is only indicative, there are no independent benchmarks, and it could very well be that they excel at different things (there is a whole variety of neural networks). However, from CNBC:

Last month, Google published a paper comparing TPUs to existing chips and said its own processors are running 15 to 30 times faster and 30 to 80 tim es more efficient than the competition. Nvidia CEO Jen-Hsun Huang shot back and said his company’s current chips have “approximately twice the performance of the TPU – the first-generation TPU.”

The problem with that is that the second generation TPU is much better than the first one. We can’t be entirely sure, but we think Huang was talking about Pascal-based chips, not the Volta.


GPU-driven AI in big data centers is a relative novelty, but it has taken off in a rampant fashion and Nvidia has been the main beneficiary. We see AI developing into the single biggest driver of economic growth to come, so Nvidia is going to ride this wave for quite some time to come.

However, given the importance and enormity of this market, one shouldn’t be surprised that the field will get more crowded. Google is already deploying its TPU (which isn’t commercially available), now in its impressive second iteration.

As GPUs aren’t designed for AI, t he company is potentially open to specialist chips, even if quite a few of these newer chips under development aren’t arrived products yet (so they can’t be evaluated) and are meant for mobile inference, so not competing directly with Nvidia’s chips.

However, partly driven by its leading position in autonomous driving, Nvidia is adapting newer GPUs with AI.

Disclosure: I/we have no positions in any stocks mentioned, and no plans to initiate any positions within the next 72 hours.

I wrote this article myself, and it expresses my own opinions. I am not receiving compensation for it (other than from Seeking Alpha). I have no business relationship with any company whose stock is mentioned in this article.

About this article:ExpandAuthor payment: $35 + $0.01/page view. Authors of PRO articles receive a minimum guaranteed payment of $150-500. Become a contributor »Tagged: Investing Ideas, Long Ideas, Technology, Semiconductor – SpecializedProblem with this article? Please tell us. Disagree with this article? Submit your own.Follow Shareholders Unite and get email alerts

Leave a Reply

Your email address will not be published. Required fields are marked *