Global Big Data Conference

Industry News Details

From Data Warehouses and Data Lakes to Data Fabrics for Analytics Posted on : Dec 03 - 2021

The evolution of data architecture for analytics began in earnest with relational data warehouses. Although these systems were good at generating insights from historic data, and thus offered some basis for predictive modeling, ultimately, they’re not very agile or responsive for the volume and variety of data that enterprises face today.

The data lake was the next progression of analytics-based architecture, mostly because it quickly accounted for the diverse schemas and data types organizations were dealing with at scale. But the way it accounted for this diversity left a lot to be desired. Because they are fundamentally enterprise file systems, data lakes typically turn into ungoverned data swamps requiring extensive engineering for organizations to connect and query data. As a result, a lot of time is spent wrangling data that, while it’s physically colocated in the data lake, is still unconnected with respect to business meaning, with the result that productivity suffers and novel insights are missed.

And while data lake houses combine some of the best properties of both data warehouses and data lakes, it’s too early to make a sober judgment on their utility and they ultimately suffer, since they are in the end indistinguishable from relational systems, from an inability to deal with the enterprise data diversity problem. Relational data models just aren’t very good at handling data diversity.

Properly implemented data fabrics represent the latest evolution of analytics architecture by greatly reducing the effort data engineers, data scientists, and data modelers spend preparing data as compared with the aforementioned approaches that are all based on physical consolidation of data. With an artful combination of semantic data models, knowledge graphs, and data virtualization, a data lakes approach enables data to remain where it lives natively, while providing uniform access to that data, which is now connected according to its business meaning, for timely query answering across clouds, on-premises, business units, and organizations.

This method streamlines the complexity of data pipelines, diminishes DataOps costs, and delivers dramatically reduced time to analytic insight.

Knowledge Graphs

Knowledge graphs play a vital role in the enhanced analytics that comprehensive data fabrics can provide to organizations. Their graph underpinnings are critical for discerning and representing complex relationships between the most diverse datasets to drastically improve insight. Additionally, they readily align data of any variation (unstructured, semi-structured, and structured data) within a universal graph construct to provide organizations sane, rationalized access to the mass of structured, semi-structured, and unstructured data they’re contending with.

When querying customer data for appropriate training datasets for machine learning models, for example, knowledge graphs can detect relationships between individual and collective attributes that elude the capabilities of conventional relational approaches. They’re also able to make intelligent inferences between semantic facts or enterprise knowledge to create additional knowledge about a specific domain; for example, dependency relationships between supply chain partners. The combination of these capabilities means firms know more about their data’s significance to specific business processes, outcomes, and analytic concerns—like why certain products sell more in the summer in specific regions than others do—which inherently creates more relevant, meaningful results. View more

Get the