Data Unleashed™ – Addressing the Need for Data-centric Agility

Data Unleashed™. The name expresses a vision of data freed from its shackles so that it can be quickly and iteratively accessed, related, studied and expanded. In order to achieve that vision, the process of combining, or federating, the data must be lightweight. That is, the approach must facilitate rapid data set expansion and on-the-fly relationship changes so that we may quickly derive insights. Conversely, the process must not include a significant investment in data structure design since agility requires that we avoid a rigid structure.

Over the past year Blue Slate Solutions has been advancing its processes and technology to support this vision, which comprises the integration between components in our Cognitive Corporation® framework. More recently we have invested in an innovation development project to take our data integration experiences and semantic technology expertise and create a service offering backed by a lightweight data federation platform. Our platform, Data Unleashed™, enables us to partner with customers who are seeking an agile, lightweight enhancement to traditional data warehousing.

I want to emphasize that we believe that the Data Unleashed™ approach to data federation works in tandem with traditional Data Warehouses (DW) and other well-defined data federation options. It offers agility around data federation, benefiting focused data needs for which warehouses are overkill while supporting a process for iteratively deriving value using a lightweight data warehouse™ approach that informs a broader warehousing solution.

At a couple of points below I emphasize differences between Data Unleashed™ and a traditional DW. This is not meant to disparage the value of a DW but to explain why we feel that Data Unleashed™ adds a set of data federation capabilities to those of the DW.

As an aside, Blue Slate is producing a set of videos specifically about semantic technology, which is a core component of Data Unleashed™. The video series, “Semantic Technology, An Enterprise Introduction,” will be organized in two tracks, business-centric and technology-centric. Our purpose in creating these is to promote a holistic understanding of the value that semantics brings to an organization. The initial video provides an overview of the series.

What is Data Unleashed™ All About?

Data Unleashed™ is based on four key premises:

the variety of data and data sources that are valuable to a business continue to grow;
only a subset of the available data is valuable for a specific reporting or analytic need;
integration and federation of data must be based on meaning in order to support new insights and understanding; and
lightweight data federation, which supports rapid feedback regarding data value, quality and relationships speeds the process of developing a valuable data set.

I’ll briefly describe our thinking around each of these points. Future posts will go into more depth about Data Unleashed™ as well. In addition, several Blue Slate leaders will be posting their thoughts about this offering and platform.

1. The variety of data and data sources that are valuable to a business continue to grow

Businesses create, purchase, contract and leverage systems that generate, store and retrieve data related to their operations. These include systems for managing the day-to-day operations of the company. Additional systems are used to manage key functions such as HR and finance. With the growth of cloud offerings some of this data is located in systems under a corporation’s direct control and some is not. Finally, the amount of freely available data continues to skyrocket.

Data is often more valuable for analytics and reporting when combined with other data. For instance, a claims processing system may benefit from integration with a medical records system. The data thus federated may then be further enhanced through the addition of a publicly available genomic or pharmacy data set.

In all cases, the subset of valuable data will differ depending on our goal while the structure of the data will differ based on the source. A platform which is able to quickly integrate data across a broad spectrum of formats, network protocols and data types is required in order to be in a position to quickly gain value from the information.

2. Only a subset of the available data is valuable for a specific reporting or analytic need

Different parts of an organization are interested in data from different sources and at different levels of granularity. For example, a dashboard allowing an operations manager to understand production status will be based on clearly defined Key Performance Indicators (KPIs). These will have been aggregated from operational and ancillary systems in a repeatable manner. Alternatively, a data scientist exploring data in order to understand patterns and identify opportunities to improve quality or productivity will need to look at a diverse and detailed data set.

In both cases only a subset of data is valuable to the individual or goal. Being able to quickly pull together relevant subsets is necessary in order to allow each part of an organization to maximize the value it can extract from all the available data.

Note that the relevant data sets are not static. As operations mature, the KPIs or the data and calculations backing them may change. As a data scientist explores a model, new data will be needed to advance hypotheses and refine the model. Therefore, the approach for creating subsets of data must be agile, which will be discussed later.

3. Integration and federation of data must be based on meaning in order to support new insights and understanding

Given the multiple sets of data available to us and the fact that only a subset of that data is valuable for a given need, we need to ensure that the process of federating the data is agile and lightweight. Quickly adding or removing federated data is necessary in order to support evolving business thinking and market opportunities. Ever-changing information requirements means our federated data set must easily shift to keep pace. To achieve this agility, and create order from potential chaos, there must be a core set of definitions for data associations so that users have a common reference point, common meanings, regardless of the changing data sets.

This is one of the major differentiators when using the Data Unleashed™ approach versus a traditional data warehouse (DW). A DW is often designed based on the structural relationships of the data being federated. These structures are a side effect of how the data relates, rather than the actual meaning of the data. Interactions with the data then proceed from a structure-centric view of data rather than a meaning-centric one.

Meanings change over time as new information is obtained or new insights are found. Further, meaning may differ based on context. The definition of “customer” may be different between the sales and billing departments. This is not a structural difference in the data, for instance a “customer” has a history of purchase orders with line items regardless of whether we are interested in the sales or billing department’s definition of a “customer.”

Also, having to align a data structure with data meaning creates a heavy-weight process. If the meaning changes, which it will over time, then the DW structure will likely need to change as well. This becomes time consuming since structural changes require alterations to address ETL processes, tuning, and queries.

Because Data Unleashed™ has a semantic view of data at its core, meaning is key to all interactions. This allows users to express data mappings and queries using the logical definitions of information without regard to the physical structures housing the source data. As data sources change, the user continues to refer to information through logical definitions regardless of the changing physical relationships.

4. Lightweight data federation, which supports rapid feedback regarding data value, quality and relationships, speeds the process of developing a valuable data set.

Given the need to quickly bring data together, explore the relationships and then iteratively change the data set based on experience, we must use a lightweight federation technology. This means that the integration of the data and the connections between data elements should be logical and malleable rather than manifesting as rigid physical structures. It is through logical connections, and being able to alter those connections through configuration, that Data Unleashed™ achieves its goal of being agile and lightweight.

This is another difference between Data Unleashed™ and a traditional DW. The DW often requires a carefully planned design using relational database technology and is generally targeted to support a broad and generalized set of users. With those constraints in place, a DW often tries to federate most of a business’ data, since it is not clear what data will be valuable to users and refactoring is expensive. Such a broad data set complicates the physical design of the DW. Both drivers, broad data set and generalized uses, often lead to a significant delay between starting a DW project and achieving business benefits from the federated information.

From the Data Unleashed™ point of view, having an agile data integration and federation platform allows the data scope for a specific reporting or analytic need to be significantly reduced. A rapid feedback process is supported which starts with a small set of data and adds to that, essentially in real time, as the need for it becomes apparent. This means that a user has access to relevant data quickly, immediately deriving value, while extending the data set based on experience.

Longer term, this approach underpins a value-centric process where the information technology organization is able to create well-defined and targeted data warehouse and data mart solutions. Using Data Unleashed™ to identify business meanings, discover valuable data and define the relationships within the data, a warehouse project understands what data to extract and how it is to be related to other data. Further, there is an understood business justification since users have interactively proven that this particular data set has value.

This concludes my broad, high-level first look at Data Unleashed™. If you’ve made it through this post, I hope you found something interesting to consider. I look forward to discussing these concepts with business and technology leaders over the coming months as we continue to refine this offering and platform. I’ll also begin sharing more details about the two key parts of Data Unleashed™ which provide its federating and analytic capabilities.

I’d appreciate your thoughts on our Data Unleashed™ offering. Feel free to comment based on your data integration and federation experiences. Do you see value in looking at lightweight data federation in this way? Where does this intersect and differ from other NoSQL approaches?

Tags: cognitive corporation, data, enterprise applications, enterprise systems, Information Systems, lightweight data federation, lightweight data warehouse, ontology, semantics

This entry was posted on Thursday, April 3rd, 2014 at 08:50 and is filed under Architecture, Cognitive Corporation, Data, Data Unleashed, Information Systems, Semantic Technology. You can follow any responses to this entry through the RSS 2.0 feed. You can skip to the end and leave a response. Pinging is currently not allowed.

You must be logged in to post a comment.

David S. Read

Dave's Reflections (Blog)