// JSON-LD for Wordpress Home, Articles and Author Pages. Written by Pete Wailes and Richard Baxter. // See: http://builtvisible.com/implementing-json-ld-wordpress/

Archive for May, 2015

Impetus for Our Semantics and NoSQL Workshop at the 2015 SmartData Conference

Friday, May 15th, 2015

I'm Speaking at the 2015 SmartData ConferenceI’m looking forward to being one of the presenters for infuzIT’s hands-on data integration and analysis workshop at this year’s SmartData Conference in San Jose. Giving people the opportunity to see the amazing power of semantics combined with NoSQL to quickly integrate and analyze data makes my day.

My background includes significant work with data, both as an application developer and data warehouse architect. The acceleration of data-centric hardware and software capabilities over the past 10 years now supports a very different paradigm for exploring, reporting and analyzing data. Processes and procedures for creating a data warehouse or mart, the accepted rules of the road for creating integrated data repositories, are no longer clear cut. The data federation debate is no longer Inmon or Kimball.

A significant shift in data integration revolves around the required lifespan of the integrated data. This lifespan has two key aspects whose evolution now allows us to rethink our approach to data federation. This permits us to be much more agile when bringing heterogeneous data sources together. The two aspects are reflected in these design questions: 1) what data, if any, will be rehosted; and 2) what relationships will be supported within the integrated data?

Rehosting Data

In a traditional data warehouse the data must be rehosted. The new repository is the target where transformed data (cleaned-up, standardized) exists. The queries that will be retrieving data from multiple sources are really pulling data from a single source that has been populated from multiple sources. It represents a heavyweight process, driven by Extract-Transform-Load (ETL) scripts and requiring space to host redundant information.

Relationships Between Data Elements

The target warehouse schema determines what relationships are defined between the data elements being combined. Getting this “right” requires careful planning and coordination between the various groups that will use the warehouse. Given the significant effort, represented as cost, organizations tend to design data warehouses to support broad constituencies as a way to amortize the investment across departments and projects.

Paradigm Shift

Semantics and NoSQL allow us to reduce the effort of integrating data by orders of magnitude. They support a completely different mindset for bringing data together. Instead of carefully designing a model that works well in the general sense (reducing the value in specific cases) we have environments that allow us to experiment, adjust and focus on each case.

Below are several drivers which allow us to approach data federation differently using semantics and NoSQL.