Nvidia has revealed an Arm-based data center CPU for AI and high-performance computing it says will provide 10 times faster AI performance than one of AMD’s fastest EPYC CPUs, a move that will give the chipmaker control over compute, acceleration and networking components in servers.
The new data center CPU, named Grace after computing programming pioneer Grace Hopper, will create new competition for x86 CPU rivals Intel and AMD when it arrives in early 2023, the launch window Nvidia CEO Jensen Huang provided during the company’s virtual GTC 2021 conference on Monday. The move comes as Nvidia seeks to close on its controversial US$40 billion acquisition of Arm, whose CPU designs are being used for Grace.
In his GTC 2021 keynote, Huang called Grace “the final piece of the puzzle” and said it will give the company the “ability to rearchitect every aspect of the data center for AI.”
With the introduction of CPUs to Nvidia’s portfolio, Huang said its data center product road map will now consist of three product lines — GPUs, data processing units (DPUs) and CPUs — and each product line will receive an updated architecture every two years. In a new road map, Nvidia showed that a next-generation Grace CPU is due out around 2025 following the release of a third-generation Ampere GPU and a BlueField-4 DPU the year before.
The chipmaker’s focus will alternate between x86 and Arm platforms every other year, according to Huang, as Nvidia wants to ensure its architecture will support platforms preferred by customers.
“Each chip architecture has a two-year rhythm with likely a kicker in between,” he said. “One year will focus on x86 platforms. One year will focus on Arm platforms.”
The company said it has already landed two major customers for its Grace CPU: the Swiss National Supercomputing Centre and the U.S. Department of Energy’s Los Alamos National Laboratory, both of which plan to bring online Grace-powered supercomputers built by Hewlett Packard Enterprise in 2023.
The Swiss National Supercomputing Centre will use Nvidia Grace CPUs in combination with unannounced next-generation Nvidia GPUs in its Alps supercomputer to provide 20 exaflops of AI performance, which is equivalent to one quintillion floating-point operations per second.
Paresh Kharya, Nvidia’s senior director of product management for accelerated computing, said the Alps supercomputer will be capable of training GPT-3, the world’s largest natural language processing model, in only two days. That is seven times faster than Nvidia’s Selene supercomputer, which is currently ranked No. 5 in the world’s top 500 supercomputers.
“This balanced architecture with Grace and a future Nvidia GPU, which we have not announced yet, will enable breakthrough research in a number of different areas by allowing them to combine the power of both HPC and AI, advancing climate and weather, advancing material sciences molecular dynamics as well as economics and social studies,” he said.
Kharya said Nvidia’s upcoming Grace CPU is meant to address the increasingly gargantuan size of AI models, which are set to grow to trillions of parameters in the next few years from the hundreds of billions of parameters used in the world’s largest AI models now.
A key challenge now and moving forward is the rate at which data moves between GPUs and the CPU, according to Kharya. Currently, Nvidia GPUs have a high memory bandwidth of 8,000 GB per second for GPU-to-GPU communication, but because many models are too big to fit in the total GPU memory, they need to go into the CPU’s system memory, which has a larger capacity. The problem with moving data between GPUs and system memory for x86 CPUs is that the memory bandwidth is capped at 64 GB per second, significantly slowing down the rate at which massive AI models can be trained.
To solve these problems, Nvidia is designing the Grace CPU to be more tightly coupled with GPUs. Kharya said Grace will be capable of delivering 900 GB per second of bi-directional bandwidth for CPU-to-GPU communication, thanks to its use of a next-generation Nvidia NVLink interconnect.
“The GPU can now access the CPU memory as fast as the CPU itself,” he said.
In a head-to-head comparison, Nvidia said a Grace-based system with 64 A100 GPUs and 64 Grace CPUs will provide 10 times faster training performance for a one-trillion-parameter natural language processing model versus a DGX cluster consisting of 64 A100s and 16 of AMDs 64-core EPYC 7742 CPUs.
Kharya said each Grace GPU will provide a geomean score of more than 300 on the SPECrate2017_int_base benchmark using standard compilers. For an eight-GPU DGX system with Grace CPUs, the total score will be 2,400, five times greater than what is capable of Nvidia’s current DGX A100 system that uses AMD’s EPYC processors, according to Nvidia.
A system using Nvidia Grace CPUs will outpace the company’s latest Nvidia DGX A100 system, which uses AMD’s 64-core EPYC 7742 CPUs, by a factor of 10 when it comes to training a one-trillion-parameter natural language processing model, according to the company.
“With Grace, the AI community will have an optimal architecture to achieve peak performance for trillion-parameter-plus models,” Kharya said. “These giant trillion parameter models that would otherwise take months to train or fine-tune, depending on the size of the cluster, can now be trained in just days.”
In addition to a next-generation NVLink, Grace will use next-generation Neoverse CPU cores from Arm as well as an LPDDR5x memory subsystem, which the company said will provide double the bandwidth and 10 times better energy efficiency. The CPU will support by Nvidia’s HPC software development kit and its full suite of CUDA and CUDA-X libraries, according to the company.
Eliot Eshelman, vice president of strategic accounts and HPC initiatives at Microway, a Plymouth, Mass.-based Nvidia HPC partner, told CRN he isn’t surprised Nvidia is making its own CPUs, given its ongoing work to support the Arm ecosystem and its pending acquisition of Arm. He added that it means Intel, AMD and Nvidia will be competing both in accelerated computing and general-purpose computing.
“It was the missing piece in their portfolio,” he said.
With Nvidia becoming a CPU vendor, it will mean greater opportunities for channel partners but also increased complexity, especially given that Nvidia’s CPUs will be using Arm architecture versus the tried-and-true x86 architecture that is used in Intel and AMD CPUs, according to Eshelman.
“We have difficulty guiding customers on what exists today, which is fewer competing solutions, so this adds more complexity, but it’s good in that it gives people more options. Competition is good for everybody,” he said.
Nvidia has increasingly viewed itself as a “data center-scale computing” company that has led the company to pursue optimization of applications at a system level. While the company has seen fast adoption of its GPUs for accelerated computing over the past few years, Nvidia expanded into high-speed networking products last year with its US$7 billion acquisition of Mellanox Technologies to address communication bottlenecks between systems.
After the announcement, Nvidia’s stock rose 2.89 percent to US$592.62 while Intel shares sank 3.82 percent to US$65.66. AMD shares were down 2.65 percent to US$80.57.