UberGlobal finds root of Friday outages

By Ry Crozier on Jun 29, 2011 8:30 AM
Filed under Services

Kit changes expose network design issue.

Web hosting firm UberGlobal has isolated the root cause of several outages late last week to the configuration of a spanning tree in its network.

The company – which owns brands including AussieHQ and Jumba – suffered three outages last Friday that affected customers in Sydney and Canberra.

A preliminary assessment suggested a piece of customer equipment that was moved on the evening of 23 June was the likely cause.

But continuing investigations uncovered a network design issue that was exposed by several equipment changes, according to a final post-incident report released today.

Specifically, the company moved a four-year-old load balancer to a different part of its environment late on 23 June.

The shift was part of a wider initiative at UberGlobal to segregate its burgeoning enterprise business from the network elements used by its smaller business/consumer web hosting brands.

When originally installed, the load balancer had been configured to act as the root bridge of a spanning tree for a small number of virtual LANs.

Spanning tree topology is used to establish virtual LAN paths and manage redundant links in networks. The piece of kit at the base of the tree is called a root bridge.

In isolation, moving the load balancer did not cause an issue, nor did the addition of a new IBM BladeCentre to the environment a day later.

But both forced recalculations of spanning trees in UberGlobal's management network. It was the second recalculation – when the BladeCentre was introduced – that led to the outage.

After the recalculation, the load balancer moved on 23 June could not 'see' a legitimate root in its new location, so it attempted to elect itself to the role.

Other devices in the tree that could see the load balancer and the legitimate root started receiving conflicting messages from the two boxes.

The messages started to loop between affected devices in the tree (circumventing its purpose). An error disable feature of UberGlobal's Cisco access switches recognised the loop error and shut down the affected ports.

 
Follow us on Facebook and Twitter
 

Copyright © iTnews.com.au . All rights reserved.

UberGlobal finds root of Friday outages
 
 
 
 
 
Top Stories
Reseller pays $2.65m for telco specialist
Acquisition scene heats up as JCurve makes another buyout.
 
Kytec files for administration, new company set up
Driven by management buyout, says MD.
 
Dataflex reborn under new owners
Buyer aiming for $30m after second acquisition in six months.
 
Sign up to receive CRN email bulletins
   FOLLOW US...
Latest Comments
Polls
Are Chromebooks ready for the enterprise?

CRN Magazine

Issue: 326 | April 2014

CRN Magazine looks in-depth at the emerging issues and developments for the channel, and provides insight, analysis and strategic information to help resellers better run their businesses.