Fast-growing big data company Databricks wants to help businesses and organizations turn their data lakes, sometimes derided as “data swamps,” into useful data “lakehouses.”
Thursday Databricks launched SQL Analytics, the company’s new software for running SQL analytic workloads directly on huge stores of unorganized, often unstructured data known as data lakes. Such workloads traditionally run on data warehouse systems and their well-organized data.
“The data lakehouse is a chance to take all the reliability and quality and scale that you’d expect of a data warehouse and bring it to a data lake and get that one, single source of truth for data engineers, data scientists and data analysts,” said Joel Minnick, vice president of product marketing at Databricks, in an interview with CRN.
The new product release is the latest announcement from Databricks, the San Francisco-based company founded by the developers of the Apache Spark unified analytics engine that underlies the Unified Data Analytics Platform, the company’s flagship product.
One of the most successful big data startups in recent years, Databricks has raised nearly $900 million in venture funding and there are reports the company could go public in the first half of 2021 in what some anticipate could be the next big tech industry IPO in the wake of Snowflake’s blockbuster IPO earlier this year. The company achieved a $350 million annual revenue run rate in the third quarter of 2020.
With SQL Analytics, Databricks looks to resolve the data warehouse vs. data lake debate in the big data world.
Data warehouses contain data – usually structured – that has been collected and organized for specific business analysis tasks using tools like Tableau, Looker and Microsoft PowerBI. While they create a “single source of truth” for an organization, they can be expensive and complex to build and maintain.
Data lakes are massive stores of unorganized data, often including unstructured data such as documents and video files, which are more economical to assemble than data warehouses and more flexible to operate.
While data lakes are popular for data science and machine learning tasks, using them for business analytics is difficult because traditional SQL-based business analytics tools can’t easily access and process the stored data. For data analysis jobs organizations have to move data to other systems, raising costs and creating data silos through an organization.
Databricks has promoted the concept of the data lakehouse as way to bring data warehouse performance and capabilities to data lakes. As part of that effort Databricks in June acquired data visualization and dashboarding software developer Redash and debuted Delta Engine for querying cloud-based data lakes.
“Databricks has talked about the data lakehouse for, roughly, two years now,” Minnick said. With SQL Analytics, “We are crossing from this being a vision to becoming architectural reality for customers.”
“A lakehouse architecture built on a data lake is the ideal data architecture for data-driven organizations and this launch gives our customers a far superior option when it comes to their data strategy,” said Ali Ghodsi, Databricks CEO and co-founder, in a statement. “We’ve worked with thousands of customers to understand where they want to take their data strategy, and the answer is overwhelmingly in favor of data lakes. The fact is that they have massive amounts of data in their data lakes and with SQL Analytics, they now can actually query that data by connecting directly to their BI tools like Tableau.”
With SQL Analytics a data lakehouse can provide nine-times better price/performance than a traditional cloud data warehouse, according to Databricks.
SQL Analytics, available for public preview on Nov. 18, incorporates the Delta Engine query execution software, as well as security and other capabilities to improve system latency and concurrency. It also provides native connectors for popular business analysis tools like Tableau and PowerBI and a SQL-native query and visualization interface.
The new product is supported by a number of Databricks consulting service partners including Slalom, Thorogood and Advancing Analytics. With SQL Analytics and the lakehouse concept, solution providers and consultants can help their clients simplify their data architectures, reduce costs and improve time-to-insight, Minnick said.
Systems integrators can use the product to help customers get more value out of their business intelligence tools, the executive added, while ISV partners can use the software to tap into more data for their applications.
“Databricks’ SQL Analytics is a critical step in the most important trend in the modern data stack: The unification of traditional SQL analytics with machine-learning and data science,” said George Fraser, CEO at data integration software developer Fivetran, a Databricks technology partner, in a statement. “Companies make huge investments in centralizing and curating data, and they should be able to make those investments once and then implement multiple analytical paradigms in a unified environment. The lakehouse architecture supports that.”