Intel has revealed the first details of its 7-nanometer Xe GPU for high-performance computing and artificial intelligence workloads in the data centre, code-named Ponte Vecchio.
It is a momentous occasion for the company, which on Sunday at the Supercomputing conference in Denver shared greater details of its GPU plans for the first time since the semiconductor giant announced plans to take another stab at the GPU market in 2017.
The new GPU will give Intel more ammunition to compete against Nvidia and, to a lesser extent, AMD, in the data centre acceleration market, which is forecasted to grow 38 percent annually to US$35 billion by the end 2025, according to research firm Market Study Report. Last week, Intel announced the launch of its Nervana neural network processors for deep learning training and inference.
At Supercomputing, the chipmaker also shared initial details of its next-generation 10nm Xeon Scalable processors, code-named Sapphire Rapids, and announced the launch of its oneAPI unified programming initiative, which aims to make it easier for developers to map software to specific hardware, such as a CPU, GPU or FPGA.
Combined, the Xeon Sapphire Rapids CPU, Xe Ponte Vecchio GPU and oneAPI will be used in the previously announced Aurora supercomputer for the U.S. Department of Energy's Argonne National Laboratory, according to a new disclosure by Intel. Each node will be outfitted with two Sapphire Rapids CPUs and six Ponte Vecchio GPUs while using oneAPI for the programming layer.
In addition to the DOE's Aurora supercomputer, Lenovo and Atos also plan to build HPC platforms using Intel's Xeon processors, Xe GPUs and oneAPI unified programming layer, according to Intel.
Intel said Sapphire Rapids will launch in 2021 — after the 14nm Cooper Lake in the first half of 2020 and the 10nm Ice Lake in the second half of 2020 — and while the company didn't give a release window for Ponte Vecchio, Intel said the Aurora supercomputer is still set for a 2021 completion. The company previously said in May that it would launch a 7nm data center GPU in 2021.
Rajeeb Hazra, corporate vice president and general manager of Intel's enterprise and government group, said the chipmaker's new innovations are necessary as traditional HPC converges with AI, moving from modeling and simulation to workloads that leverage deep learning. HPC, AI and analytics are the top workloads driving compute demand growth at an annual rate of 60 percent, he added.
"That diversity of computing needs then drives a new tailwind for heterogeneous computing. It's no longer ‘one size fits all,’ and we have to look at architectures tuned to the various needs of various kinds of workloads in this convergence era," he said in a call with journalists.
The new Intel Xe GPU details
Ponte Vecchio is one of the GPUs Intel is developing using its new Xe architecture that will serve as the basis for products across a wide range of market segments: HPC, deep-learning training, cloud graphics, media transcode analytics, workstation, gaming, PC mobile and ultra mobile.
“Several years ago at Intel, we saw a need for developing one architecture, one graphics architecture that will enable us to scale all the way from the traditional workloads" to new exascale capabilities for HPC and deep learning training,” said Ari Rauch, vice president of Intel Architecture, Graphics and Software, and general manager of the Visual Technologies team and Graphics Business.
Rauch said the purpose of building one architecture for Intel's GPU efforts was to give developers a common framework. But from that Xe architecture, the company is developing "many microarchitectures that enable peak efficient performance at each one of those workloads."
Ponte Vecchio is based on the Xe microarchitecture for HPC and AI, and the microarchitecture features will include a flexible data-parallel vector matrix engine, high double precision floating point throughput and ultra high cache and memory bandwidth, according to Rauch.
"We need to add high, intense computation performance to the targeted workload, so we were focusing on adding a lot of vector and matrix and parallel compute engines that are tailored and optimized for the need of that workload," he said.
Rauch said Ponte Vecchio is Intel's first exascale GPU that will "deliver an exciting level of performance." The GPU will leverage multiple new technologies that Intel has been developing for the past few years, including its 7nm manufacturing process, its Foveros 3D chip packaging and Xe Link, which will be based on the new CXL interconnect standard that the chipmaker is working on.
Thanks to Foveros, Intel will be able to stack multiple "tiles of the same engine” on the GPU's package, enabling the company to "scale up the performance in an efficient way" while leveraging the GPU's memory and bandwidth. CXL, on the other hand, will allow Intel to interface the GPU through a unified memory space.
OneAPI: Unified programming for heterogenous hardware
With oneAPI, which launched in beta on Sunday, Intel is trying to ease the work of developers who have traditionally had to switch between different programming languages and libraries for different components while they struggled to optimise hardware on certain middleware and frameworks.
"The default in the [HPC] industry would be that at low-level programming, you'd change for each architecture you're targeting," said Bill Savage, vice president of Intel Architecture, Graphics and Software, and general manager of Compute Performance and Developer Products.
On the matter of optimisation issues, he pointed to how TensorFlow was initially fully optimised for one vendor's GPU when it was initially released.
"OneAPI is trying to address both of these by offering a low-level common interface to heterogeneous hardware with uncompromised performance," Savage said, "so that HPC developers can code directly to the hardware through languages and libraries that are shared across architectures and across vendors as well as making sure that middleware and frameworks are powered by oneAPI and fully optimized for the developers that live on top of that abstraction."
Savage said is both an "industry initiative and an Intel product."
Intel is touting oneAPI as an "open standard to promote community and industry support" that will enable "code reuse across architectures and vendors." The oneAPI industry specification will consist of a standards-based, cross-architecture language, DPC++, that is based on C++ and SYCL, for direct programming, as well as "powerful APIs for acceleration of key domain-focused functions."
As a product, oneAPI will include an implementation of the DPC++ compiler as well as the library of APIs from the oneAPI specification, but it will also feature analysis tools like vTune Inspector Advisor as well debugging tools and a compatibility tool.
Savage said the compatibility tool "aids in the migration of [Nvidia's] CUDA source code to [DPC++] source code to aid the migration from a proprietary solution to this open standard that we're driving with the industry," signaling Intel's intent to compete with Nvidia more directly in the software space.
To encourage adoption of oneAPI, Intel has launched the product on its Intel DevCloud, which provides a "development sandbox to develop, test and run workloads across a range of Intel CPUs, GPUs and FPGAs." The DevCloud requires no downloads, no hardware acquisition, no installation and no setup or configuration.