Creating RDF Triples from a Relational Database
In an earlier blog entry I discussed the potential reduction in refactoring effort if our data is represented as RDF triples rather than relational structures. As a way to give myself easy access to RDF data and to work more with semantic web tool features I have created a program to export relational data to RDF.
The program is really a proof-of-concept. It takes a SQL query and converts the resulting rows into assertions of triples. The approach is simple: given a SQL statement and a chosen primary key column (PK) to represent the instance for the exported data, assert triples with the primary key column value as the subject, the column names as the predicates and the non-PK column values as the objects.
Here is a brief sample taken from the documentation accompanying the code.
- Given a table named people with the following columns and rows:
id name age -- ---- --- 1 Fred 20 2 Martha 25
- And a query of: select id, name, age from people
- And the primary key column set to: id
- Then the asserted triples (shown using Turtle and skipping prefixes) will be:
dsr:PK_1 a owl:Thing , dsr:RdbData ; rdfs:label "1" ; dsr:name "Fred" ; dsr:age "20" . dsr:PK_2 a owl:Thing , dsr:RdbData ; rdfs:label "2" ; dsr:name "Martha" ; dsr:age "25" .
You can see that the approach represents a quick way to convert the data.
The next question is, “How do I refactor the data?” That was, after all, what my previous blog entry was discussing. The decision for me becomes whether I need to add a bunch of features to the export program or is there a way to use features of the semantic web (e.g. OWL, SWRL) to refactor the data?
I compare this in some ways to the initial XML specification that required a DTD as the way to define the valid structure for an XML document. The DTD is expressed using a different (not XML) meta-language. This proved a poor choice, needlessly complicating parsers as well as developer’s learning curves. The subsequent improvement was a move to XML schema, itself expressed using XML. This added a consistency, using XML to describe XML.
I view the use of OWL (and probably something like SWRL if it continues to evolve) as a way to use a consistent technology to deal with data refactoring. After all, if I am creating RDF data using semantic technologies and need to modify the data structure in some way (changing class or property names, adding classifications, etc.) then it makes sense to use the same semantic technologies to affect the transformation.
In the sample program I do just that. I load an ontology file that contains my conversion assertions and then create the RDB-sources triples, allowing the reasoner to assert my changes.
I have released the program as open source. The code and some documentation are available for download on my RDB To RDF web page.
At this point I’ve started small in terms of the inferencing, simply adding a superclass relationship and class membership based on a property value. My goal was to get version one complete, creating a starting point on which to build out functionality.
Please feel free to download, use, and modify the program. If you have feedback about its operation and the concepts being discussed please add a comment.
Tags: data, Information Systems, Java, linkedin, ontology, open source, programming, semantic web, semantics
August 6th, 2010 at 06:52
[...] This post was mentioned on Twitter by IT Blog Network and Semantic Web Blogs, David Read. David Read said: Data conversion using semantic web concepts? http://monead.com/blog/?p=497 #in [...]