UberGlobal finds root of Friday outages

By Ry Crozier on Jun 29, 2011 8:30 AM
Filed under Services

Kit changes expose network design issue.

Web hosting firm UberGlobal has isolated the root cause of several outages late last week to the configuration of a spanning tree in its network.

The company – which owns brands including AussieHQ and Jumba – suffered three outages last Friday that affected customers in Sydney and Canberra.

A preliminary assessment suggested a piece of customer equipment that was moved on the evening of 23 June was the likely cause.

But continuing investigations uncovered a network design issue that was exposed by several equipment changes, according to a final post-incident report released today.

Specifically, the company moved a four-year-old load balancer to a different part of its environment late on 23 June.

The shift was part of a wider initiative at UberGlobal to segregate its burgeoning enterprise business from the network elements used by its smaller business/consumer web hosting brands.

When originally installed, the load balancer had been configured to act as the root bridge of a spanning tree for a small number of virtual LANs.

Spanning tree topology is used to establish virtual LAN paths and manage redundant links in networks. The piece of kit at the base of the tree is called a root bridge.

In isolation, moving the load balancer did not cause an issue, nor did the addition of a new IBM BladeCentre to the environment a day later.

But both forced recalculations of spanning trees in UberGlobal's management network. It was the second recalculation – when the BladeCentre was introduced – that led to the outage.

After the recalculation, the load balancer moved on 23 June could not 'see' a legitimate root in its new location, so it attempted to elect itself to the role.

Other devices in the tree that could see the load balancer and the legitimate root started receiving conflicting messages from the two boxes.

The messages started to loop between affected devices in the tree (circumventing its purpose). An error disable feature of UberGlobal's Cisco access switches recognised the loop error and shut down the affected ports.

Follow us on Facebook and Twitter

Copyright © iTnews.com.au . All rights reserved.


UberGlobal finds root of Friday outages
Top Stories
AFL star joins Melbourne's Broadband Solutions
Shaun Grigg starts second job at peak of footy career.
Amazon Web Services killing it: revenue up 64%
Cloud vendor also triples operating income.
WestConnex signs national reseller Viatek
Five-year deal with Sydney Motorway Corporation.
Sign up to receive CRN email bulletins
What's the most important factor when partnering with a new vendor?

Latest Comments
CRN Magazine

Issue: 347 | March 2016

CRN Magazine looks in-depth at the emerging issues and developments for the channel, and provides insight, analysis and strategic information to help resellers better run their businesses.