Hewlett Packard Enterprise is arming COVID-19 researchers with the high-performance computing and artificial intelligence capabilities necessary to make scientific breakthroughs on new treatments and vaccines — two fields that are "more critical than ever," according to the vendor's top HPC executive.
Through its US$1.3 billion acquisition of Cray in 2019, the vendor is supporting COVID-19 researchers across the world with existing deployments, including the Theta supercomputer at the U.S. Department of Energy's Argonne National Laboratory and the Jean Zay supercomputer at France's National Center for Scientific Research.
In an interview with CRN, Peter Ungaro, former CEO of Cray and senior vice president of HPE's HPC and mission critical solutions, said the company is supporting these existing systems and helping other customers expand their capabilities for the purpose of accelerating COVID-19 research.
"As a company, we're super proud that our technology and our teams are being called in to help organizations power the scientific breakthroughs that we need to fight against COVID-19," he said. "It's a great example of what you can do inside of a company that doesn't have much to do with the day-to-day business of the company but really shows how you can take the resources of a large organization like HPE and use it [for good]."
Ungaro said HPE systems have already shown promising results in speeding up the time it takes to identify potential antibodies that can attack the novel coronavirus. Using the Catalyst HPC cluster at the DOE's Lawrence Livermore National Laboratory, researchers have already narrowed down the number of potential antibody candidates from 10 to the 40th power, a number with 41 decimal digits, to 20 — something that would have taken years using other approaches.
Scientific work like that could be done even quicker with the three exascale supercomputers HPE is providing to DOE laboratories over the next few years, according to Ungaro, which he said underlines the importance of the continued investment that the U.S. and other countries are making in next-generation HPC and AI capabilities.
"It's really showing that HPC and AI — the technologies that are going to be leveraged in these large exascale machines — are more critical than ever," he said. "If we had these exascale systems already installed and ready to go today, just think of how much faster the progress would be."
In his interview, Ungaro talked about the different ways HPC is having an impact on COVID-19 research, how the pandemic is impacting the overall market, to what extent research opportunities are creating a new for new systems and upgrades, the importance of channel partners in deploying and supporting new systems and how HPC differs from traditional data center workloads.
What are the different ways HPC is having an impact on COVID-19 research?
A huge part of what people use HPC for is to do models and simulations. So we're trying to take the physical world and show it digitally. Obviously with COVID-19 and everything going on, there's everything from trying to look for new vaccines and therapies, and that goes all the way from bioinformatics and proteomics, all the way up through the drug discovery process, using tools like cryo-EM and things like that with high-performance computers.
It's also modeling. One of the most interesting areas is modeling the decision criteria. So, for instance, do we reopen the businesses, and how do we think about that, and what are the tradeoffs? Those are all computer models that need to be run to understand the trade-offs between the risks of reopening and the business implications and financial implications of not reopening. It's really been very interesting. We have people looking at it from all aspects from. The ones that we tend to focus on a lot [range from] finding a drug, vaccine or therapy, to understanding models of how this may propagate and how do we think about the different choices that politicians and other people have to make?
With regards to the HPE systems supporting COVID-19 research, is there any specific support or services work that HPE is providing to help with what they're doing?
One of the most exciting things about being in this space is to see what scientists and engineers do with the machines, and there's not a more pressing challenge for us right now than to try to help out in this fight against COVID-19, so it's a really exciting time from that perspective. As we look at the different ways that the machines are being used today, HPE is participating in this broad industry consortium with the U.S. government, for instance. And one of the things that we're doing is providing technical support, so helping people to leverage these big machines.
Most people do not get the chance to use these huge supercomputers, and so we're helping them to port their applications over, optimize their applications and take advantage of this computational power that's being opened up to combat COVID-19. A huge part of it is really that kind of technical support. So whether that's application specialists that would help someone understand how do I take my model from running on a small cluster in a university or a laboratory in one of the pharmaceutical companies or biotech companies and then move that to one of these huge supercomputers like our Theta supercomputer at Argonne National Laboratory. How do you use that kind of machine? How do you scale the application and run it at that scale? So we have application experts as well as systems experts that are helping people to do those transformations.
Are those opportunities where you are billing for time?
No. We're donating it all. We definitely are. One of the things I'm actually most proud of about HPE right now is just how the whole company is banding together to support this.
And it's not only in the HPC area. We're donating Wi-Fi equipment to makeshift hospitals, and we're helping with temporary care facilities that are being set up and doing donations in both financial and a ton of people resources, like what we're talking about here as well as systems that people are using inside of HPE. It's really great to see how the company is helping in the whole pandemic. And how I think we're being a really positive force out there in this area.
For the amount of research that is happening right now, specifically for COVID-19, Is there any need for new systems coming online or upgrades on the hardware side?
We definitely have seen that. We have a number of customers, especially in the university, national laboratory space that are in the process or increasing the size of their systems or adding in an another system, specifically for this research. So we're seeing that quite a bit actually, which is exciting to see the positive side of this, of people expanding their capabilities.
Are you donating that hardware? Or are the universities and national labs paying for the systems?
We have both [situations], so we have some where we've been able to help out in and donate some capabilities. Most of that, we've done as far as allowing them to use systems inside of HPE today. Other customers have gotten budget for it and are wanting to do a permanent expansion. So, not just a temporary setup, so that is a more traditional opportunity.
If you can't mention customers by name, can you give an example of how many nodes they're adding, or is there any other way you can quantify the work that's going on behind the scenes?
We have a number of customers that are adding anywhere between one to five cabinets of systems. Think of that as typically 100 or 200 processors in a cabinet and adding that capacity to their overall infrastructure. We're seeing that in national laboratories and universities, which are doing that pretty quickly. We've started to see even in few commercial companies that are adding capacity in that range.
For these kinds of opportunities, is HPE relying on channel partners for this?
Yeah, we have. There's a number of channel partners that we've worked with for some of the opportunities. The channel's super important to HPE. We have a number of really good partners that are working in this in the HPC and AI areas. AI, especially, has been a huge area for the channel as they've built capabilities to help customers to implement AI.
And, of course, a lot of people are using AI modeling, whether that be deep learning or machine learning or just even big data analytics methods, as part of their overall modeling and simulation that they're doing within HPC. So the channel's been an important part of how we have reached out to customers, and there's channel partners participating in some of these opportunities.
HPE, through its acquisition of Cray, is involved in the first three exascale supercomputers that are set to go online in the next few years as part of the U.S. exascale program. Does the urgency to find treatments and vaccines for COVID-19 underline the importance of HPC and those investments?
I really believe so. It's really showing that high-performance computing and AI — the technologies that are going to be leveraged in these large exascale machines — are more critical than ever. And I think it is really helping people feel very good that they're making the kinds of investments that they're making in these large solutions. Because the amount of data that we're dealing with today is just exploding, and COVID-19 brings a whole other explosion of that.
What we've seen already from a number of different customers just shows that they're already making strides on these machines. And so if we had these exascale systems already installed and ready to go today, just think of how much faster the progress would be. I think it's really showing that this is going to become more and more important to people, and we're clearly very proud that we're in a leadership position in building these large machines.
When looking at HPC versus more traditional data center workloads like virtualization, how do the architectural needs differ?
I would say in three areas it's very different from an architectural perspective. One is the amount of data that we're typically modeling and doing simulations with, it requires much higher bandwidth than is typical for SAP or other traditional enterprise computing-style workloads. It's much higher bandwidth systems, whether that's the bandwidth from the processors to memory or between the nodes of the computer, that's a big thing.
The second one, and it's related to that, which is the interconnect. The system interconnect — or the fabric that connects all of the nodes of the system together and moves data around the machine — is fundamentally different than what we see in enterprise computing. And so that's a huge area of focus of ours. We just have announced a new Slingshot interconnect that we've used that we think is leadership in this area for doing very high-bandwidth movement of data. We use InfiniBand from Mellanox with a number of our systems also, especially our more commodity-based systems.
And then the third thing is software, because you can imagine, when you have thousands of nodes in a machine, or thousands of processors or potentially millions and millions of cores on a machine, managing that whole infrastructure as one computer [can be challenging since] many times you're going to run single applications across large portions of that machine. In more traditional workloads, you're taking applications, you're splitting them up, you're running them in a node or a processor.
With high performance computing, you're running these applications in parallel and maybe 1,000- or 100,000-way parallelism or even a million-way parallelism with the exascale machines. And so having software that allows you to manage all of these components as one machine and one system and not individual servers [is one difference]. [There's also being] able to optimize the processing power of that to get as much as much computing as you can possibly do.
How important are GPUs becoming to HPC in relation to CPUs?
CPUs have always been important and will continue to have a lot of importance. But as developers get more adept in modifying their applications to take advantage of more specialized processors, you can get higher levels of performance and better price-performance by using more specialty processors. GPUs are a huge example of [specialized processors] that are less general purpose than a CPU, but pretty general purpose still. You can use them for a lot of things.
[There are also] specialized processors for doing deep learning in artificial intelligence. We're seeing a lot of people using that are using FPGAs or different technologies. We really believe that there's a spectrum of technologies that, depending on both your application portfolio as well as your ability to modify your application to take advantage of these different technologies, you may be able to use more specialized processors to give you better price-performance for your overall investment that you're making. If you can get more work done for the same amount of money, that's a good thing, and we're seeing that with GPUs: being a nice bridge between CPUs, which are standard processors and are very generalized and can be used for anything, and specialized processors like deep learning accelerators, for instance.
How has the pandemic impacted the overall demand environment for HPC?
I'd say this from two perspectives. For high-performance computers, clearly it's increased the demand of these systems. And so we're seeing that being a great example of the work that we're doing across various consortiums to leverage the compute capability that's inside of HPE, inside of some of the cloud vendors and providing that to researchers.
As far as the overall market demand, I think there's kind of a push and pull. So COVID-19 has increased some demand, we talked about some of the systems that people are expanding or upgrading to take advantage of the processing that we need to do for COVID-19 research. On the other hand, overall, the economy is depressed by COVID-19. I think you have a little bit of both happening, where you have a little bit of contraction and you have some expansion, but we're in a really good spot in HPC, where I think we're expanding more than we're contracting overall.