UberGlobal finds root of Friday outages

By Ry Crozier on Jun 29, 2011 8:30 AM
Filed under Services

Kit changes expose network design issue.

Web hosting firm UberGlobal has isolated the root cause of several outages late last week to the configuration of a spanning tree in its network.

The company – which owns brands including AussieHQ and Jumba – suffered three outages last Friday that affected customers in Sydney and Canberra.

A preliminary assessment suggested a piece of customer equipment that was moved on the evening of 23 June was the likely cause.

But continuing investigations uncovered a network design issue that was exposed by several equipment changes, according to a final post-incident report released today.

Specifically, the company moved a four-year-old load balancer to a different part of its environment late on 23 June.

The shift was part of a wider initiative at UberGlobal to segregate its burgeoning enterprise business from the network elements used by its smaller business/consumer web hosting brands.

When originally installed, the load balancer had been configured to act as the root bridge of a spanning tree for a small number of virtual LANs.

Spanning tree topology is used to establish virtual LAN paths and manage redundant links in networks. The piece of kit at the base of the tree is called a root bridge.

In isolation, moving the load balancer did not cause an issue, nor did the addition of a new IBM BladeCentre to the environment a day later.

But both forced recalculations of spanning trees in UberGlobal's management network. It was the second recalculation – when the BladeCentre was introduced – that led to the outage.

After the recalculation, the load balancer moved on 23 June could not 'see' a legitimate root in its new location, so it attempted to elect itself to the role.

Other devices in the tree that could see the load balancer and the legitimate root started receiving conflicting messages from the two boxes.

The messages started to loop between affected devices in the tree (circumventing its purpose). An error disable feature of UberGlobal's Cisco access switches recognised the loop error and shut down the affected ports.

Follow us on Facebook and Twitter

Copyright © iTnews.com.au . All rights reserved.


UberGlobal finds root of Friday outages
Top Stories
Award winners from the 2015 CRN Fast50
[Photos] See who picked up a prize at the awards last week.
Hills chief: transformation cost us customers, staff, revenue
Will "get back to basics" after tumultuous years.
The red carpet at the 2015 CRN Fast50
[Photos] The nation's best resellers celebrate.
Sign up to receive CRN email bulletins
Was your most important vendor the same in 2015 as in 2014?

Latest Comments
CRN Magazine

Issue: 343 | October 2015

CRN Magazine looks in-depth at the emerging issues and developments for the channel, and provides insight, analysis and strategic information to help resellers better run their businesses.