Creating a SPARQL Endpoint Using Joseki
Being a consumer of semantic data I thought creating a SPARQL endpoint would be an interesting exercise. It would require having some data to publish as well as working with a SPARQL library. For data, I chose a set of mileage information that I have been collecting on my cars for the last 5 years. For technology, I decided to use the Joseki SPARQL Server, since I was already using Jena.
For those who want to skip the “how” and see the result, the SPARQL endpoint along with sample queries and a link to the ontology and data is at: http://monead.com/semantic/query.html
The first step in this project was to convert my mileage spreadsheets into triples. I looked briefly for an existing ontology in the automobile domain but didn’t find anything I could use. I created an ontology that would reflect my approach to recording automobile mileage data. My data records the miles traveled between fill-ups as well as the number of gallons used. I also record the car’s claimed MPG as well as calculating the actual MPG.
The ontology reflects this perspective of calculating the MPG at each fill-up. This means that the purchase of gas is abstracted to a class with information such as miles traveled, gallons used and date of purchase as attributes. I abstracted the gas station and location as classes, assuming that over time I might be able to flesh these out (in the spreadsheet I record the name of the station and the town/state).
A trivial Java program converts my spreadsheet (CSV) data into triples matching the ontology. I then run the ontology and data through Pellet to derive any additional triples from the ontology. The entire ontology and current data are available at http://monead.com/semantic/data/HybridMileageOntologyAll.Inferenced.xml.
It turns out that the ontology creation and data conversion were the easy parts of this project. Getting Joseki to work as desired took some time, mostly because I couldn’t find much documentation for deploying it as a servlet rather than using its standalone server feature. I eventually downloaded the Joseki source in order to understand what was going wrong. The principle issue is that Joseki doesn’t seem to understand the WAR environment and relative paths (e.g. relative to its own WAR).
I had two major PATH issues: 1) getting Joseki to find its configuration (joseki-config.ttl); and 2) getting Joseki to find the triple store (in this case a flat file).
For the first issue I found several comments from other users who were also using Tomcat as the application server (which is what I am doing). In one case the recommendation was to place the joseki-config.ttl file in the “bin” directory of the Tomcat server and in another case the recommendation was to place it in any directory on the server but refer to it in the web.xml configuration using its absolute (server specific) path.
The first approach did not work for me. It probably works if Tomcat’s “bin” directory is the shell’s current directory when Tomcat starts. Since I’m running Tomcat as a service that approach doesn’t work well. I then went on and tried a variety of approaches to avoid the hardcoded path recommendation but in the end could not make anything else work. As a servlet-based solution, the configuration should understand a relative path in the web.xml and apply it to the WAR (or provide a syntax to access files on the classpath), but as of version 3.4.2 that doesn’t appear to be supported.
With access to the Joseki configuration file resolved (e.g. my web.xml entry for the configuration path setup with an absolute path the the Joseki configuration file) I encountered my next roadblock – accessing the data. Again, the file path given in the joseki-config.ttl file is not interpreted as being relative to the WAR. Worse, I couldn’t get it to find or parse the file no matter what I did.
After a lot of experimentation I believe that the issue was a question of file extensions. In my case I had named my file with the suffix of “turtle” and it probably needed to be “ttl”. I should go back and retry some of the configuration settings I had tried to see if they now work. In any case, the stack traces were not enlightening, which would have been valuable to my debugging process. Not finding the data file versus not being able to parse the file seemed to both lead to NullPointerExceptions with no text description of any underlying issue.
Eventually I switched approaches and placed the data in RDF/XML format at a public URL (mentioned above). I then configured Joseki to use that URL as its data source and it started working.
I had more fun as I tried to deploy the working solution to my Go Daddy hosting account. It turns out that Go Daddy currently uses Java 5 and Joseki requires Java 6 due to its use of the BlockingDeque class in the java.util.concurrent package. After some fussing to prove that was the issue, I setup the servlet on my own Internet-facing server and was finally able to publish it.
I invite people to try the endpoint and suggest any related ontologies that I can use instead of mine or as options for linking. As mentioned at the top of this entry, the page describing the ontology and providing a form for accessing the SPARQL endpoint is at: http://monead.com/semantic/query.html.
Tags: Java, linkedin, ontology, open source, semantic web, semantics