UberGlobal finds root of Friday outages

By Ry Crozier on Jun 29, 2011 8:30 AM
Filed under Services

Kit changes expose network design issue.

Web hosting firm UberGlobal has isolated the root cause of several outages late last week to the configuration of a spanning tree in its network.

The company – which owns brands including AussieHQ and Jumba – suffered three outages last Friday that affected customers in Sydney and Canberra.

A preliminary assessment suggested a piece of customer equipment that was moved on the evening of 23 June was the likely cause.

But continuing investigations uncovered a network design issue that was exposed by several equipment changes, according to a final post-incident report released today.

Specifically, the company moved a four-year-old load balancer to a different part of its environment late on 23 June.

The shift was part of a wider initiative at UberGlobal to segregate its burgeoning enterprise business from the network elements used by its smaller business/consumer web hosting brands.

When originally installed, the load balancer had been configured to act as the root bridge of a spanning tree for a small number of virtual LANs.

Spanning tree topology is used to establish virtual LAN paths and manage redundant links in networks. The piece of kit at the base of the tree is called a root bridge.

In isolation, moving the load balancer did not cause an issue, nor did the addition of a new IBM BladeCentre to the environment a day later.

But both forced recalculations of spanning trees in UberGlobal's management network. It was the second recalculation – when the BladeCentre was introduced – that led to the outage.

After the recalculation, the load balancer moved on 23 June could not 'see' a legitimate root in its new location, so it attempted to elect itself to the role.

Other devices in the tree that could see the load balancer and the legitimate root started receiving conflicting messages from the two boxes.

The messages started to loop between affected devices in the tree (circumventing its purpose). An error disable feature of UberGlobal's Cisco access switches recognised the loop error and shut down the affected ports.

Follow us on Facebook and Twitter

Copyright © iTnews.com.au . All rights reserved.


UberGlobal finds root of Friday outages
Top Stories
Nick Verykios: HPE will "add $200m to our revenue"
Disruptive win for Distribution Central.
Resellers judge HPE distribution shake-up
"This was a very, very big step for HPE."
Optus makes $461m in nine months from managed services
Optus Business going gangbusters.
Sign up to receive CRN email bulletins
Which regulator is the most effective?

Latest Comments
CRN Magazine

Issue: 345 | December 2015

CRN Magazine looks in-depth at the emerging issues and developments for the channel, and provides insight, analysis and strategic information to help resellers better run their businesses.