Google’s top engineer says the network that supports its enterprise cloud, as well as its many massive consumer services, is in good shape to handle the sustained surge in traffic brought on by the coronavirus crisis.
In a blog posted Thursday, the same day Google saw a cloud outage, Google senior vice president of engineering Urs Hölzle said the internet giant has implemented plans to stay well-ahead of rising demand from people confined to their homes and virtualizing their professional and social lives.
“As the coronavirus pandemic spreads and more people move to working or learning from home, it’s natural to wonder whether the Google network can handle the load,” Hölzle said. “The short answer is yes.”
That blog post coincided with an outage affecting Google customers across the Eastern Seaboard. Hölzle, in a separate tweet that day, said the incident was not related to increased usage because of the crisis, but instead caused by a router failure in Atlanta.
“Just to make sure: this wasn't related to traffic levels or any kind of overload, our network is not stressed by Covid-19,” he tweeted on Thursday.
The network supporting Google Cloud, as well as YouTube, Search, Maps and Gmail, was designed to meet large surges in demand, he explained.
“The same systems we built to handle peaks like the Cyber Monday online shopping surge, or to stream the World Cup finals, support increased traffic as people turn to Google to find news, connect with others, and get work done during this pandemic,” Hölzle said.
“While we’re seeing more usage for products like Hangouts Meet, and different usage patterns in products like YouTube, peak traffic levels are well within our ability to handle the load,” he added.
Google operates a network that connects all its data centers with high capacity fiberoptic cables that stretch the world across land and sea, Hölzle said.
That dedicated network handles all cloud traffic until it is handed off for the last mile to more than 3,000 local ISPs, limiting the burden on those regional operators with networks that have different levels of reserve capacity.
Google is working with governments and network operators internationally to “minimize stress on the system,” he said. An example is the recent decision to default videos on YouTube to “standard definition”—a lower resolution that reduces traffic.
Google recognizes the importance of its services in the crisis, its top engineer said, and will continue to add capacity to stay ahead of demand.
“Our dedicated global network deployment and operations team is increasing capacity wherever needed, and, in the event of a disruption, recovers service as quickly as possible,” he said.
Also on Thursday, Google’s Vice President for Customer Experience, John Jester, sent a letter to customers explaining Google’s continuity planning and reassuring them of the cloud’s resilience.
Google has long conducted disaster recovery testing, he said, to evaluate the resilience of its infrastructure and processes. Those drills have enabled Google to flag potential problems before they occur and recover from disruptions as quickly as possible.
Google’s site reliability engineers are “in constant communication with our leadership team and actively monitoring global and local conditions,” Jester said.
Because Google runs its Google Cloud Platform and G Suite services on proprietary compute and storage hardware, it can forecast capacity months in advance, and always be sure it will meet future demand.
“We’re monitoring capacity closely and do not foresee shortfalls at this time,” Jester assured enterprise customers.
Google also maintains “considerable reserve capacity” in its own network and at hundreds of points of presence and thousands of edge location, he said.
“The performance of our infrastructure remains as high as it was before the pandemic—the result of years of preparation,” Jester said.