AMD is launching new offensives against Intel and Nvidia with new EPYC CPUs and Instinct GPUs it says are faster in data centre workloads for high-performance computing and AI.
The chipmaker revealed a refresh of its third-generation EPYC processors, which goes by the codename Milan-X and incorporates the company’s new 3D chiplet technology that AMD first teased as part of an upcoming refresh for the Ryzen 5000 desktop CPUs that are set to enter production by next month. Milan-X will be supported by Cisco, Dell Technologies, Hewlett Packard Enterprise, Lenovo and Supermicro.
The chipmaker also unveiled the Instinct MI200 series, the second generation of the company’s GPU accelerators for data centres that make use of the chipmaker’s purpose-built CNDA 2 architecture.
Ram Peddibhotla, corporate vice president of product management at AMD, called the 3D chiplet technology powering the new EPYC CPUs a “big advancement for the industry.” With plans to launch the refreshed EPYC chips in the first quarter of 2022, this means AMD will beat Intel to the market with a processor that uses 3D chip packaging, which is a part of Intel’s data centre road map but won’t be used for the rival’s upcoming Sapphire Rapids CPUs.
“This is a truly revolutionary technology that is a key part of how AMD will push the envelope in high-performance computing. And we‘re the first to bring it to market in the data centre,” Peddibhotla said in a pre-briefing with journalists last week.
Peddibhotla said the new EPYC CPUs are designed for technical computing, which includes applications like computational fluid dynamics, structural analysis, finite element analysis and electronic automation.
While he did not provide any direct performance comparisons between the upcoming Milan-X chips and Intel’s latest Xeon Scalable CPUs, he said Milan-X will provide, on average, a 50 percent performance improvement in technical computing workloads over the original third-generation EPYC CPUs, which continue to hold more than 250 world records.
As a teaser, Peddibhotla showed that AMD’s 32-core EPYC 75F3, which is currently available, can provide up to 40 percent better performance in Ansys CFX for fluid dynamics, up to 34 percent better performance in Altair Radioss for structural analysis and up to 33 percent better performance in Ansys Mechanical for finite element analysis when compared to Intel’s 32-core Xeon Platinum 8362.
Then he said that a 16-core variant of AMD’s new EPYC lineup with 3D chiplet technology can provide roughly 66 percent faster register-transfer verification in Synopsys VCS for electronic design automation — a key application for designing computer chips — in comparison to a third-generation, 16-core EPYC processor without 3D chiplet technology that is currently available.
Key to enabling this performance increase is the 3D chiplet technology, which allowed AMD to bond a vertical cache onto the chip and triple the CPU’s L3 cache. This means the new EPYC CPUs will add 64MB on top of the 32MB of L3 cache that is present on every core complex die in the currently available third-generation EPYC CPUs.
With a maximum of eight core complex dies as part of Milan-X’s chiplet design, that translates into up 768MB of L3 cache for a single processor. When adding the L2 and L1 caches, that amounts to a total of 804MB of cache per socket.
Peddibhotla said that is a significant advancement because larger caches benefit technical computing workloads since it “ensures that “critical data” is closer to the CPU’s cores.
“That’s an amazing amount of cache. This additional L3 relieves memory bandwidth pressure and reduces latency, and that, in turn, speeds up application performance dramatically,” he said.
The new Milan-X CPUs will feature up to 64 cores, like the existing third-generation EPYC lineup, and they will be “fully compatible” with existing platforms with a “simple BIOS upgrade.”
“Our customers can drop Milan-X into existing platforms, and this will accelerate customer qualification and gets them to market faster,” Peddibhotla said.
To ensure Milan-X is optimised for technical computing workloads, AMD has “deep engineering engagement” with key independent software vendors like Altair, Ansys, Cadence, Siemens and Synopsys, which cover verticals from automotive and finance to life sciences and manufacturing.
“They’re all very excited about the possibilities and the performance of Milan X, and we are working closely with them to bring the combined hardware and software solutions to market,” Peddibhotla said.
New AMD Instinct MI200 GPU ‘faster’ than Nvidia’s A100
Showing the chipmaker’s ambitions to steal data centre market share away from Nvidia, AMD executive Brad McCredie called the new Instinct MI200 series “the world’s fastest HPC and AI accelerator,” saying it can provide up to 4.9 times faster HPC performance and up to 20 percent faster AI performance compared to the 400-watt SXM version of Nvidia’s flagship A100 GPU configured with 80GB of memory.
“With this multi-generational leap in capability, Ml200 is smashing performance records across a broad set of HPC applications, from molecular dynamics to astrophysics and to a range of other HPC applications that are critical to the foundation of science. MI200 is also the fastest data centre GPU in the industry for AI training, delivering up to 1.2 times higher peak flops for mixed-precision performance, helping to fuel the convergence of HPC and AI,” said McCredie, whose title is vice president for data centre GPU accelerators.
The Instinct MI200 series will be available in two form factors. The highest-performance versions will be available in the OAM form factor, which is the Open Compute Project Foundations’ Accelerator Module specification that enables greater scalability for hyperscale customers. The GPUs will also be available in PCIe cards, which will be more widely available through OEM systems.
The new lineup’s flagship Instinct MI250X GPU is currently available in Hewlett Packard Enterprise’s Cray EX Supercomputer systems. Additional systems with Instinct MI200 GPUs will be available in the first quarter of 2022 from ASUS, ATOS, Dell Technologies, Gigabyte, Hewlett Packard Enterprise, Lenovo and Supermicro, among other OEMs and ODMs.
The performance comparisons against Nvidia’s A100 GPU were made using the Instinct MI250X GPU, which will have the OAM form factor and feature 220 compute units, 14,080 stream processors and 128GB of HBM2e, a form of high-bandwidth memory that enables 3.2 Gbps. AMD’s Instinct MI200 GPUs also feature up to 58 billion transistors using a 6-nanometer manufacturing process and up to 880 second-generation Matrix Cores.
McCredie said the high-bandwidth capacity makes the Instinct MI250X the “world’s first GPU available with 128GB of HBM2e.” That represents four times more capacity and 2.7 times more bandwidth than Nvidia’s A100 GPU, according to the executive.
Compared to the A100, the Instinct MI250X can achieve 47.9 teraflops, or 47.9 trillion floating point operations per seconds, in FP64 vector and matrix performance, which is 4.9 times faster, according to AMD. It can also achieve 47.9 teraflops in FP32 vector performance, 2.5 times faster. For FP16 and BF16 matrix performance, the GPU can reach 383 teraflops, 20 percent faster.
AMD also showed comparisons across various HPC applications: 2.4 times faster for OpenMM, 2.2 times faster for LAMMPS, 1.9 times faster for HACC, 1.6 times faster for LSMS and 1.4 times faster for MILC.
Key to these performance advancements are ROCm 5.0, AMD’s software stack for GPU compute, and the company’s third-generation Infinity architecture, which enables high-speed communication between the GPU and CPU and between GPUs. Another enabling technology is AMD’s 2.5D elevated fanout bridge, which allowed the company to put up to two GPU dies together in a package, making the Instinct MI200 series AMD’s first multi-die GPU.
The Instinct MI200 series will power the US Department of Energy’s Frontier supercomputer, which is set to become the United States’ first exascale computer when it goes online soon.
“The MI200 series accelerators are the world’s most advanced accelerators powering leadership HPC and AI workloads that will help scientists get to world-changing results faster,” McCredie said.