Dave's Reflections

Posts Tagged ‘ontology’

Sparql Droid – A Semantic Technology Application for the Android Platform

Friday, June 24th, 2011

Sparql Droid logo The semantic technology concepts that comprise what is generally called the semantic web involve paradigm shifts in the ways that we represent data, organize information and compute results. Such shifts create opportunities and present challenges. The opportunities include easier correlation of decentralized information, flexible data relationships and reduced data storage entropy. The challenges include new data management technology, new syntaxes, and a new separation of data and its relationships.

I am a strong advocate of leveraging semantic technology. I believe that this new paradigms provide a more flexible basis for our journey to create meaningful, efficient and effective business automation solutions. However, one challenge that differentiates leveraging semantic technology from more common technology (such as relational databases) is the lack of mature tools supporting a business system infrastructure.

It will take a while for solid solutions to appear. Support for mainstream capabilities such as reporting, BI, workflow, application design and development that all leverage semantic technology are missing or weak at best. Again, this is an opportunity and a challenge. For those who enjoy creating computer software it presents a new world of possibilities. For those looking to leverage mature solutions in order to advance their business vision it will take investment and patience.

In parallel with the semantic paradigm we have an ever increasing focus on mobile-based solutions. Smart phones and tablet devices, focused on network connectivity as the enabler of value, rather than on-board storage and compute power, are becoming the standard tool for human-system interaction. As we design new solutions we must keep the mobile-accessible mantra in mind.

As part of my exploration of these two technologies, I’ve started working on a semantic technology mobile application called Sparql Droid. Built for the Android platform, my goal is a tool for exploring and mashing semantic data sources. As a small first-step I’ve leveraged the Androjena port of the Jena framework and created an application with some basic capabilities.

(more…)

Tags: Java, linkedin, ontology, programming, semantic web
Posted in Java, Semantic Web, Software Development, Tools and Applications | 1 Comment »

Creating a SPARQL Endpoint Using Joseki

Monday, November 29th, 2010

Being a consumer of semantic data I thought creating a SPARQL endpoint would be an interesting exercise. It would require having some data to publish as well as working with a SPARQL library. For data, I chose a set of mileage information that I have been collecting on my cars for the last 5 years. For technology, I decided to use the Joseki SPARQL Server, since I was already using Jena.

For those who want to skip the “how” and see the result, the SPARQL endpoint along with sample queries and a link to the ontology and data is at: http://monead.com/semantic/query.html

The first step in this project was to convert my mileage spreadsheets into triples. I looked briefly for an existing ontology in the automobile domain but didn’t find anything I could use. I created an ontology that would reflect my approach to recording automobile mileage data. My data records the miles traveled between fill-ups as well as the number of gallons used. I also record the car’s claimed MPG as well as calculating the actual MPG.

The ontology reflects this perspective of calculating the MPG at each fill-up. This means that the purchase of gas is abstracted to a class with information such as miles traveled, gallons used and date of purchase as attributes. I abstracted the gas station and location as classes, assuming that over time I might be able to flesh these out (in the spreadsheet I record the name of the station and the town/state).

A trivial Java program converts my spreadsheet (CSV) data into triples matching the ontology. I then run the ontology and data through Pellet to derive any additional triples from the ontology. The entire ontology and current data are available at http://monead.com/semantic/data/HybridMileageOntologyAll.Inferenced.xml.

It turns out that the ontology creation and data conversion were the easy parts of this project. Getting Joseki to work as desired took some time, mostly because I couldn’t find much documentation for deploying it as a servlet rather than using its standalone server feature. I eventually downloaded the Joseki source in order to understand what was going wrong. The principle issue is that Joseki doesn’t seem to understand the WAR environment and relative paths (e.g. relative to its own WAR).

I had two major PATH issues: 1) getting Joseki to find its configuration (joseki-config.ttl); and 2) getting Joseki to find the triple store (in this case a flat file).

(more…)

Tags: Java, linkedin, ontology, open source, semantic web, semantics
Posted in Java, Public Data, Semantic Web, Software Development, Tools and Applications | No Comments »

Semantic Workbench – A Humble Beginning

Wednesday, August 18th, 2010

As a way to work with semantic web concepts, including asserting triples, seeing the resulting inferences and also leveraging SPARQL, I have needed a GUI. In this post I’ll describe a very basic tool that I have created and released that allows a user to interact with a semantic model.

My objectives for this first GUI were basic:

Support input of a set of triples in any format that Jena supports (e.g. REF/XML, N3, N-Triples and Turtle)
See the inferences that result for a set of assertions
Create a tree view of the ontology
Make it easy to use SPARQL queries with the model
Allow the resulting model to be written to a file, again using any format supported by Jena

Here are some screen shots of the application. Explanations of the tabs are then provided.

: Initial View: Appearance at startup. The reasoner cannot be run until there is text in the assertions text area.

: Assertions Tab Populated: The assertions tab is shown populated. The Run Reasoner button is then used to run the reasoner and create an in-memory model that can be saved to disk or explored using SPARQL.

: Inferences Tab: Once the reasoner has been run successfully (e.g. legal set of assertons entered on the Assertions tab and the Run Reasoner button used), any inferences will be displayed on this tab.

: Tree View Tab: Once the reasoner has been run successfully (e.g. legal set of assertons entered on the Assertions tab and the Run Reasoner button used), the model (asserted and inferred) will be shown as a tree structure based on class.

: SPARQL Tab: Execute SPARQL queries against the model

The program provides each feature in a very basic way. On the Assertions tab a text area is used for entering assertions. The user may also load a text file containing assertions using the File|Open menu item. Once the assertions are entered, a button is enabled that allows the reasoner to process the assertions. The reasoner level is controlled by the user from a drop down.

(more…)

Tags: Information Systems, linkedin, ontology, open source, semantic web, semantics
Posted in Information Systems, Java, Semantic Web, Tools and Applications | No Comments »

Creating RDF Triples from a Relational Database

Thursday, August 5th, 2010

In an earlier blog entry I discussed the potential reduction in refactoring effort if our data is represented as RDF triples rather than relational structures. As a way to give myself easy access to RDF data and to work more with semantic web tool features I have created a program to export relational data to RDF.

The program is really a proof-of-concept. It takes a SQL query and converts the resulting rows into assertions of triples. The approach is simple: given a SQL statement and a chosen primary key column (PK) to represent the instance for the exported data, assert triples with the primary key column value as the subject, the column names as the predicates and the non-PK column values as the objects.

Here is a brief sample taken from the documentation accompanying the code.

Given a table named people with the following columns and rows:

       id    name    age
       --    ----    ---
       1     Fred    20
       2     Martha  25

And a query of: select id, name, age from people
And the primary key column set to: id
Then the asserted triples (shown using Turtle and skipping prefixes) will be:

       dsr:PK_1
          a       owl:Thing , dsr:RdbData ;
          rdfs:label "1" ;
          dsr:name "Fred" ;
          dsr:age "20" .

       dsr:PK_2
          a       owl:Thing , dsr:RdbData ;
          rdfs:label "2" ;
          dsr:name "Martha" ;
          dsr:age "25" .

You can see that the approach represents a quick way to convert the data.

(more…)

Tags: data, Information Systems, Java, linkedin, ontology, open source, programming, semantic web, semantics
Posted in Information Systems, Java, Semantic Web, Software Composition, Software Development, Tools and Applications | 1 Comment »

My First Semantic Web Program

Saturday, June 5th, 2010

I have create my first slightly interesting, to me anyway, program that uses some semantic web technology. Of course I’ll look back on this in a year and cringe, but for now it represents my understanding of a small set of features from Jena and Pellet.

The basis for the program is an example program that is described in Hebler, Fischer et al’s book “Semantic Web Programming” (ISBN: 047041801X). The intent of the program is to load an ontology into three models, each running a different level of reasoner (RDF, RDFS and OWL) and output the resulting assertions (triples).

I made a couple of changes to the book’s sample’s approach. First I allow any supported input file format to be automatically loaded (you don’t have to tell the program what format is being used). Second, I report the actual differences between the models rather than just showing all the resulting triples.

As I worked on the code, which is currently housed in one uber-class (that’ll have to be refactored!), I realized that there will be lots of reusable “plumbing” code that comes with this type of work. Setting up models with various reasoners, loading ontologies, reporting triples, interfacing to triple stores, and so on will become nuisance code to write.

Libraries like Jena help, but they abstract at a low level. I want a semantic workbench that makes playing with the various libraries and frameworks easy. To that end I’ve created a Sourceforge project called “Semantic Workbench“.

I intend for the Semantic Workbench to provide a GUI environment for manipulating semantic web technologies. Developers and power users would be able to use such a tool to test ontologies, try various reasoners and validate queries. Developers could use the workbench’s source code to understand how to utilize frameworks like Jena or reasoner APIs like that of Pellet.

I invite other interested people to join the Sourceforge project. The project’s URL is: http://semanticwb.sourceforge.net/

On the data side, in order to have a rich semantic test data set to utilize, I’ve started an ontology that I hope to grow into an interesting example. I’m using the insurance industry as its basis. The rules around insurance and the variety of concepts should provide a rich set of classes, attributes and relationships for modeling. My first version of this example ontology is included with the sample program.

Finally, I’ve added a semantic web section to my website where I’ll maintain links to useful information I find as well as sample code or files that I think might be of interest to other developers. I’ve placed the sample program and ontology described earlier in this post on that page along with links to a variety of resources.

My site’s semantic web page’s URL is: http://monead.com/semantic/
The URL for the page describing the sample program is: http://monead.com/semantic/proj_diffinferencing.html

Tags: Information Systems, Java, linkedin, ontology, open source, programming, semantic web, semantics, system integration
Posted in Information Systems, Java, Semantic Web, Software Composition, Software Development, Tools and Applications | 1 Comment »

Database Refactoring and RDF Triples

Wednesday, May 12th, 2010

One of the aspects of agile software development that may lead to significant angst is the database. Unlike refactoring code, the refactoring of the database schema involves a key constraint – state! A developer may rearrange code to his or her heart’s content with little worry since the program will start with a blank slate when execution begins. However, the database “remembers.” If one accepts that each iteration of an agile process produces a production release then the stored data can’t be deleted as part of the next iteration.

The refactoring of a database becomes less and less trivial as project development continues. While developers have IDE’s to refactor code, change packages, and alter build targets, there are few tools for refactoring databases.

My definition of a database refactoring tool is one that assists the database developer by remembering the database transformation steps and storing them as part of the project – e.g. part of the build process. This includes both the schema changes and data transformations. Remember that the entire team will need to reproduce these steps on local copies of the database. It must be as easy to incorporate a peer’s database schema changes, without losing data, as it is to incorporate the code changes.

These same data-centric complexities exist in waterfall approaches when going from one version to the next. Whenever the database structure needs to change, a path to migrate the data has to be defined. That transformation definition must become part of the project’s artifacts so that the data migration for the new version is supported as the program moves between environments (test, QA, load test, integrated test, and production). Also, the database transformation steps must be automated and reversible!

That last point, the ability to rollback, is a key part of any rollout plan. We must be able to back out changes. It may be that the approach to a rollback is to create a full database backup before implementing the update, but that assumption must be documented and vetted (e.g. the approach of a full backup to support the rollback strategy may not be reasonable in all cases).

This database refactoring issue becomes very tricky when dealing with multiple versions of an application. The transformation of the database schema and data must be done in a defined order. As more and more data is stored, the process consumes more storage and processing resources. This is the ETL side-effect of any system upgrade. Its impact is simply felt more often (e.g. potentially during each iteration) in an agile project.

As part of exploring semantic technology, I am interested in contrasting this to a database that consists of RDF triples. The semantic relationships of data do not change as often (if at all) as the relational constructs. Many times we refactor a relational database as we discover concepts that require one-to-many or many-to-many relationships.

Is an RDF triple-based database easier to refactor than a relational database? Is there something about the use of RDF triples that reduces the likelihood of a multiplicity change leading to a structural change in the data? If so, using RDF as the data format could be a technique that simplifies the development of applications. For now, let’s take a high-level look at a refactoring use case.

(more…)

Tags: agile development, efficient coding, enterprise applications, enterprise systems, Information Systems, linkedin, ontology, refactoring, semantic web, semantics, system integration
Posted in Architecture, Information Systems, Semantic Web, Software Composition, Software Development, Tools and Applications | 1 Comment »

Business Ontologies and Semantic Technologies Class

Sunday, May 9th, 2010

Last week I had the pleasure of attending Semantic Arts’ training class entitled, “Designing and Building Business Ontologies.” The course, led by Dave McComb and Simon Robe, provided an excellent introduction to semantic technologies and tools as well as coverage of ontological best practices. I thoroughly enjoyed the 4-day class and achieved my principle goals in attending; namely to understand the semantic web landscape, including technologies such as RDF, RDFS, OWL, SPARQL, as well as the current state of tools and products in this space.

Both Dave and Simon have a deep understanding of this subject area. They also work with clients using this technology so they bring real-world examples of where the technology shines and where it has limitations. I recommend this class to anyone who is seeking to reach a baseline understanding of semantic technologies and ontology strategies.

Why am I so interested in semantic web technology? I am convinced that structuring information such that it can be consumed by systems, in ways more automated than current data storage and association techniques allow, is required in order to achieve any meaningful advancement in the field of information technology (IT). Whether wiring together web services or setting up ETL jobs to create data marts, too much IT energy is wasted on repeatedly integrating data sources; essentially manually wiring together related information in the absence of the computer being able to wire it together autonomously!

(more…)

Tags: efficient coding, enterprise applications, enterprise systems, Information Systems, linkedin, ontology, Public Data, semantic web, semantics, system integration, web services
Posted in Architecture, Information Systems, Public Data, Software Composition, Software Development, Tools and Applications | 1 Comment »

David S. Read