// JSON-LD for Wordpress Home, Articles and Author Pages. Written by Pete Wailes and Richard Baxter. // See: http://builtvisible.com/implementing-json-ld-wordpress/

Archive for the ‘Public Data’ Category

Creating a SPARQL Endpoint Using Joseki

Monday, November 29th, 2010

Being a consumer of semantic data I thought creating a SPARQL endpoint would be an interesting exercise.  It would require having some data to publish as well as working with a SPARQL library.  For data, I chose a set of mileage information that I have been collecting on my cars for the last 5 years.  For technology, I decided to use the Joseki SPARQL Server, since I was already using Jena.

For those who want to skip the “how” and see the result, the SPARQL endpoint along with sample queries and a link to the ontology and data is at: http://monead.com/semantic/query.html

The first step in this project was to convert my mileage spreadsheets into triples.  I looked briefly for an existing ontology in the automobile domain but didn’t find anything I could use.  I created an ontology that would reflect my approach to recording automobile mileage data.  My data  records the miles traveled between fill-ups as well as the number of gallons used.  I also record the car’s claimed MPG as well as calculating the actual MPG.

The ontology reflects this perspective of calculating the MPG at each fill-up.  This means that the purchase of gas is abstracted to a class with information such as miles traveled, gallons used and date of purchase as attributes.  I abstracted the gas station and location as classes, assuming that over time I might be able to flesh these out (in the spreadsheet I record the name of the station and the town/state).

A trivial Java program converts my spreadsheet (CSV) data into triples matching the ontology.  I then run the ontology and data through Pellet to derive any additional triples from the ontology.  The entire ontology and current data are available at http://monead.com/semantic/data/HybridMileageOntologyAll.Inferenced.xml.

It turns out that the ontology creation and data conversion were the easy parts of this project.  Getting Joseki to work as desired took some time, mostly because I couldn’t find much documentation for deploying it as a servlet rather than using its standalone server feature.  I eventually downloaded the Joseki source in order to understand what was going wrong.  The principle issue is that Joseki doesn’t seem to understand the WAR environment and relative paths (e.g. relative to its own WAR).

I had two major PATH issues: 1) getting Joseki to find its configuration (joseki-config.ttl); and 2) getting Joseki to find the triple store (in this case a flat file).


Business Ontologies and Semantic Technologies Class

Sunday, May 9th, 2010

Last week I had the pleasure of attending Semantic Arts’ training class entitled, “Designing and Building Business Ontologies.”  The course, led by Dave McComb and Simon Robe, provided an excellent introduction to semantic technologies and tools as well as coverage of ontological best practices.  I thoroughly enjoyed the 4-day class and achieved my principle goals in attending; namely to understand the semantic web landscape, including technologies such as RDF, RDFS, OWL, SPARQL, as well as the current state of tools and products in this space.

Both Dave and Simon have a deep understanding of this subject area.  They also work with clients using this technology so they bring real-world examples of where the technology shines and where it has limitations.  I recommend this class to anyone who is seeking to reach a baseline understanding of semantic technologies and ontology strategies.

Why am I so interested in semantic web technology?  I am convinced that structuring information such that it can be consumed by systems, in ways more automated than current data storage and association techniques allow, is required in order to achieve any meaningful advancement in the field of information technology (IT). Whether wiring together web services or setting up ETL jobs to create data marts, too much IT energy is wasted on repeatedly integrating data sources; essentially manually wiring together related information in the absence of the computer being able to wire it together autonomously!


Project H.M.

Sunday, December 6th, 2009

I have been following the work being done by The Brain Observatory at UCSD to carefully section the brain of patient H.M. The patient, whose identity was protected while he was living, is known as the most studied amnesiac.  His amnesia was caused by brain surgery he underwent when he was 27 years old.

Screenshot from the live broadcast of Project H.M.'s brain slicing process

Screenshot from the live broadcast of Project H.M.'s brain slicing process

I won’t redocument his history, it is widely available on various websites, a few of which I’ll list at the end of this posting.  For me, this study is fascinating in terms of the completely open way the work is being done.  The process of sectioning the brain was broadcast in real time on UCSD’s website.  The entire process that is being followed is being discussed in an open forum.  The data being collected will be freely available.  For me this shows the positive way that the web can be leveraged.

I spend so much time in the world of commercial and proprietary software solutions that I sometimes end up with a distorted view of how the web is used.  Most of my interactions on the web are in the creation of applications that are owned and controlled by companies whose content is only available to individuals with some sort of financial relationship with the web site owner.

Clearly sites like Wikipedia make meaningful content available at no cost to the user.  However, in the case of this work at UCSD, there is an enormous expense in terms of equipment and people in order to collect, store, refine and publish this data.  This is truly a gift being offered to those with an interest in this field.  I’m sure that other examples exist and perhaps a valuable service would be one that helps to organize such informational sites.

If you are interested in more information about H.M. and the project at UCSD, here are some relevant websites: