The enormous and exponential quantities of unstructured data now being generated, largely via powerful modes of digital communication such as social media, but also by the burgeoning armies of intelligent and connected devices and machines powering industry, is creating new repositories of potential intelligence which could make the difference between success and failure.
Mining is one industry in Australia being forced to acknowledge and act on the power of big data. Australia is the only country in the world where mining companies are required to provide all of the data collected during the course of exploration and production, meaning that this country has arguably the richest geographical and geological repositories of data about itself of any other nation.
At Australia’s leading IT research centre, NICTA, researchers are using this data to aid in the discovery and exploitation of Australia’s purportedly vast geothermal resources, with funding from the Federal Government’s renewable energy program.
“The big challenge is trying to first discover the geothermal resources and then characterise them,” says NICTA boss Hugh Durrant-Whyte. "What’s the temperature? How easy is it to fracture? Is the rock porous? Can water be pumped through?"
NICTA has generated vast quantities of complex data from precise maps showing the exact elevations of the entire Australian land mass to variations in magnetic energy, gravitational deviation as well as the oscillation of the Australian continent in response to ocean movements.
So far a terabyte of information has been generated from which the researchers hope to gain a better understanding of what’s going on underground. These sort of massive data challenges are becoming more common throughout industry.
“Everyone has lots of data; now the question is what to do with it,” says Durrant-Whyte.
“But the most interesting results will derive from unconventional thinking; it’s not about doing what you already know – it’s about doing what you don’t know.”
Tier-1 vendors have been on a frenzied buying spree over the past few years with IBM alone snapping up more than 30 big data companies from around the world. HP’s US$10.2 billion acquisition of UK data analytics specialists Autonomy was the second biggest in the IT industry last year, beaten only by Google’s $12.5 billion buy of Motorola Mobility.
Helping to underpin most of the solutions currently in the market is the ground-breaking and disruptive open source big data platform called Hadoop.
Hadoop implements a computational paradigm named Map/Reduce, where the application is divided into many small fragments of work, each of which may be executed or re-executed on any node in the cluster.
In addition, it provides a distributed file system (HDFS) that stores data on the compute nodes, providing very high aggregate bandwidth across the cluster.
Single view
Certus Solutions is one of IBM’s premier partners in Australasia and over the past few years has developed a strong business intelligence and ‘big data’ practice. Information management lead Vincent McBurney says one of the major drivers for big data in Australia is a growing need among organisations to bring large and disparate data sets together under a single view.
The company recently helped financial services company MLC to implement customer data matching and de-duping across the group for a single customer view. Certus used Information Server Suite to reduce the time taken to process large customers lists and used DataStage for scalable ETL processing.
MLC wanted to raise the awareness of governance for information management so Certus used InfoSphere Discovery for data quality assessment and overlap analysis and Business Glossary for enterprise terms and definitions.
McBurney stresses the importance of profiling, classifying and understanding data so as to avoid the risk of building the wrong thing, which can often lead to heavy costs.
“In a big data scenario there is added cost and penalties in building the wrong thing or delivering the wrong data. Our [Certus’] point of difference is that we don’t just build code we deliver data profiling and build enterprise glossaries,” McBurney says.
Another of the company’s clients is the National Centre for Vocational Education Research (NCVER), a body responsible for collecting, analysing and communicating research and statistics about vocational education and training (VET) nationally.
The organisation receives information about all non-university tertiary enrolments across Australia going back 20 years and wanted to find better ways of identifying key trends from that data. Certus helped NCVER implement a scalable data collection platform using IBM Information Server and InfoSphere Warehouse.
“NCVER is a not for profit organisation and was able to demonstrate how to manage a large amount of data with a relatively small IT department,” McBurney says.
He admits many smaller organisations are reluctant to make the jump into big data and business intelligence because they see it as too great a capital outlay with little guarantee of a return. There is also the issue of big data vendors overstating the complexity of the challenges facing their customers, yet McBurney emphasises that the world of unstructured data is a universe of its own.
“If you Google something you get a million research results and searching for meaning and zeroing in on a dozen or so documents that are important is complex,” he says.
“That’s the side that’s going to be the biggest growth area in Australia and something we haven’t tackled very well in the past.”
Regarding cost, Certus is among a growing group of resellers who have started giving customers the option to pay per terabyte. So, rather than being forced to pay for 10 times that capacity, much of which they probably won’t use, customers can instead pay for CPU licences at small increments as their needs evolve.
Real-time analysis
The evolving possibilities for big data over the past few years has seen increased interest in so-called real-time analysis. Industries such as finance and retail stand to reap huge competitive benefits by being able to identify and react to data trends immediately.
High-frequency trading, for instance, is allowing stock brokers to conduct trades with far more certainty and speed than those jostling on the trading room floor.
And in the increasingly competitive world of online retail, the ability to know what your customer is thinking at a point in time can be the difference between making the sale or not. Web-based companies in particular stand to benefit from using real-time big data solutions, says Ross Farrelly, chief data scientist with Teradata Australia.
“Every time someone clicks on your site there’s a web log added to your database with information such as time, duration and the user’s identity.”
Using this information correctly a company could potentially discover how to alter the behaviour of people who visit the site but don’t make a purchase.
"You want to find all customers that have nine most common steps but never taken that last step.”
Growing demand for this capability has driven rapid advancements in so-called in-memory technologies over the past few years. In terms of big data, in-memory refers to the ability to have large and complex data repositories actually reside in the physical memory, meaning it is far faster to retrieve than systems where software is involved in the process.
Virtually all of the major big data vendors as identified by analyst groups Gartner and IDC have some sort of in-memory offering within their “big data” portfolios.
Crime doesn’t pay
Sydney-based company NetMap Analytics – spun out of the Sydney University of Technology – deployed its unique data management techniques back in the early ‘90s, leading NSW Police to finally identify Ivan Milat as the notorious backpacker killer.
The company has subsequently developed a strong customer base in the insurance and financial services industries, especially around fraud detection.
However, according to managing director Peter Anderson, the company’s efforts to move beyond its traditional base and into areas such as the retail sector have been less than successful.
“Retailers see a lot of these activities as discretionary spend and there’s a reticence currently to go down into these spend areas,” Anderson says, noting also that many big data projects have remained stalled since the GFC.
“The market generally is conservative.”
NetMap conducts most of its business direct, however, the company recognises that developing a proper partner network would likely smooth the path the market, especially if it is able to develop relationships with resellers already established in its key target markets.
“We’re looking for smart and well connected systems integrators,” Anderson says.
According to big data vendors and their partners, one of the key concerns raised by businesses is the usability of data analysis tools. With the industry at such an early stage very few people have experience of information management programs outside of Excel.
The majority of the successful products on the market therefore feature quite powerful graphical capabilities including the ability to manipulate and analyse data sets in 3D.
“Resellers should choose big data solutions where there is a strong emphasis on visualisation,” says Mark Sands, regional manager with QlikTech.
The company has been recognised by Gartner as a leader in visualisation, a key capability to ensure uptake of big data solutions, especially among smaller businesses where there remains a fair degree of trepidation. It’s also been an important factor, Sands says, in convincing partners to embrace the QlikView portfolio of in-memory solutions.
QlikTech has been investing heavily in its partner program over the past few years and now has around1400 globally, 30 of which are in Australia. Last year QlikTech launchedits QlikView 11 Certification Program, available to members of the Qonnect Partner Program.
The result is an extensive ecosystem of third-party organisations helping to provide QlikView customers with an end-to-end Business Discovery solution.
Another big data vendor which has earned kudos for its visualisation capabilities is Swiss-based Board. The company started selling through distributors in Australia in 2004 but only recently established a proper local presence.
The company’s local support manager, Steve Kellar, says its solutions have quickly gained traction in the local market. The fact that its offering requires little coding has proved a boon with customers attracted to the fact they can deploy solutions without the need for the vendor or its partners to spend much time configuring on site.
“Customers don’t feel locked in to us,” Kellar says.
However, Board boasts an effective drag-and-drop tool kit which it says makes it easier for partners to customise solutions.
“It’s a good vehicle for people with vertical experience to get this tool kit and develop apps for their vertical,” Kellar says.
Board recently launched a free demo which is available upon request with the company.
The business case
Oracle has a dedicated big data team working to assist the company’s partners in understanding and deploying solutions as well as selling the business case.
“Everyone’s interested in big data but not everyone’s sure what it is and how is applies to their organisation,” says Scott Tumbridge, key partner director with Oracle Australia.
“It’s like the early days of the internet it’s also great and exciting but not everyone’s sure where the money is going to come from.”
Oracle has made massive investments in developing its “big data” portfolio, which like many of its competitors’ offerings boasts in-memory capabilities embedded into a broad range of applications and appliances.
Rugby league tackles big data
"The challenge is building ROI arguments to convince companies to spend the money,” says Philip de Harcourt, senior consultant with Professional Advantage Australia.
“A lot of these deliverables are intangible.”
But, he says, the tools are rapidly coming down in price while becoming easier to use.
Harcourt has been involved in several big data projects helping NSW rugby league clubs to better understand their customers with Professional Advantage developing a solution called ClubIntel. Most recently it deployed a Microsoft-based data solution at Sydney’s Norths Rugby Club which has transformed operations across the business.
“By analysing memberships and everything a member does when they enter a leagues club we can predict the people who actually make the most money for it,” de Harcourt says.
Professional Advantage sells big data solutions from Board, QlikTech and Microsoft. Microsoft is attempting to grab mindshare as a provider of more affordable big data solutions, although there has been some suggestion the software giant has been late to the party.
Marcy Larsen, a lead with Microsoft Australia’s BI practice, rejects this outright, stressing that SQL 12, released in April this year and boasting a high degree of integration with Hadoop, would see the company gain an important foothold in the market.
“On TCO [for big data solutions] Microsoft stands side by side with Oracle or IBM at a more competitive price point.”
Microsoft is supportive of customers that want to use SQL 12 in its Azure cloud environment and has tweaked it software licensing to allow customers to move between it and on premise.
Microsoft has been working closely with NICTA on the application of machine learning and software algorithms to address challenges such as fraud and customer churn in the telecommunications industry.
One of Microsoft’s greatest advantages she says is commitment to preserving for users the tools and interfaces they are already familiar with. Microsoft has made considerable investments in developing partner programs around big data and by early July this year it expects to announce a formal global partner initiative.
Larsen says she expects Microsoft’s traditional global systems integrator partners to play a key role in bringing its big data story to market. Local partners with specific experience in BI would also play an important role, she says, although the uptake of big data solutions amongst its channel would be gradual.
“We won’t see an explosion of new partners on the scene – I think that’s going to take some time.”
Watson, I presume?
Big Blue big data specialist Alex Paris explains that IBM’s approach to big data is informed by the concept of ‘information streams’.
“IBM is the only vendor that identifies stream computing – the ability to stream and process data in-flight,” Paris says.
One of the biggest stories in big data of recent times was IBM’s Watson super computer defeating the world Jeopardy game-show champions last year. Many saw this as a major turning point in the industry and a quantum step beyond Big Blue beating Kasparov at chess, involving infinitely more complex processes and the application of human-like intuition on steroids.
Watson is now a key pillar of IBM’s big data strategy and is playing an important role in helping to advance industries such as pharmaceuticals and telecommunications.
Last year IBM launched BigDataUniversity.com to provide education resources on Hadoop, stream computing, and big data analytics skills. The number of participants enrolled has doubled over the past five months to over 18,000.
In addition, IBM last year held 1200 big data skills boot camps at client, partner and university sites. More than 2400 college students, graduate students and IT professionals were trained on the latest data management techniques.
Temporary advantage
Sydney-based SAP shop Innogence has seen a sharp uptick in demand for big data solutions amongst Australian companies. Last year the company placed 66th on the BRW Fast 100 and is hoping to do at least as well this year.
“Big data is the art of making the impossible possible,” says CEO Phil Cameron. One of the pillars of SAP’s big data story is its in-memory platform, HANA.
“The performance that HANA is able to provide is significantly greater than what anyone else has got. SAP has a temporary advantage in the market."
Many of Innogence’s customers are $1 billion-plus Australian companies. As well as working with several firms in finance and mining it is also leading a major big data project for a large Australian retail customer.
But Innogence is increasingly working with smaller organisations, most notably state and Federal Government agencies.
Doug Gibson, director of high-performance analytics with SAP Australia, says that because HANA uses raw data it requires less storage and processing capacity than other solutions on the market. At the moment SAP is in the early stages of deploying HANA-based solutions for some 13 local customers but Gibson is not willing to discuss specifics.
He says the emerging big data sector has a number of challenges to overcome before it achieves widespread adoption, most notably in helping businesses understand what to do with it.
“Preparing and managing big data is one thing but actually getting it right is another. Once you have it what are you going to do with it?”
EMC serves up GreenPlum
EMC says its acquisition of big data company GreenPlum in 2010 has given it an important edge. EMC Australia’s chief technology officer Clive Gold explains that its strength lies in its ability to tackle so-called semi-structured data.
T-Mobile is currently working with EMC/Greenplum and a number of other vendors to better understand the behaviour of its customers, in particular what they do immediately after they churn.
The carrier discovered that five out of 10 people customers talk to after they churn also churn themselves within 90 days. In order to gain that particularly valuable insight billions of call records needed to be accessed and analysed in very quick time.
“The power of big data lies in discovering questions you didn’t know to ask.”
In the insurance industry for instance, big data is already affecting a transition away from the traditional “group profile” method which determines for instance, that if you are a young man you are more dangerous and should therefore pay a premium, towards an approach whereby decisions are made on the basis of complex individual data.
Companies able to deliver this level of sophistication will leave other in their wake. “If a company is not thinking about this they won’t be in business in 10 years’ time,” Gold says.
EMC’s wants its partners to become “data scientists”, and is especially interested in those companies where there are good “maths” skills, “people who can actually understand the application of what they do,” Gold says.
EMC is viewing opportunities to sell bespoke big data storage solutions that reflect the particular nuances and challenges of big data.
“Traditional storage has started to creak,” Gold says. “The first day you plug traditional storage in is your best day – it just gets worse after that.”
IT generating big data
The growing importance to organisations of big data is leading to an increased focus on applying advanced data analysis techniques to companies’ core IT systems, notes Matt Elliot, COO with Sydney-based IT analytics specialists eMite.
“We have taken all the tools and techniques in the BI world and applied them to IT operations,” he explains.
A core focus for the company is helping organisations break down their information silos.
“It astounds me that IT environments are still absolutely full of silos that essentially don’t talk to each other,” Elliot says.
But perhaps more importantly, eMite’s Riskband solution aggregates all of the data produced by companies’ IT hardware and software with a view to helping them understand exactly how their IT is performing at any given time.
As organisations spend more time and money trying to figure out what their customers are likely to be thinking and / or buying on a given Tueday, it stands to reason they will eventually want to know in as much detail how their core IT systems are feeling too.