// JSON-LD for Wordpress Home, Articles and Author Pages. Written by Pete Wailes and Richard Baxter. // See: http://builtvisible.com/implementing-json-ld-wordpress/

Semantic Technology and Business Conference, East 2011 – Reflections

December 7th, 2011

I had the pleasure of attending the Semantic Technology and Business Conference in Washington, DC last week.  I have a strong interest in semantic technology and its capabilities to enhance the way in which we leverage information systems.  There was a good selection of topics discussed by people with a variety of  backgrounds working in different verticals.

To begin the conference I attended the half day “Ontology 101” presented by Elisa Kendall and Deborah McGuinness.  They indicated that this presentation has been given at each semantic technology conference and the interest is still strong.  The implication being that new people continue to want to understand this art.

Their material was very useful and if you are someone looking to get a grounding in ontologies (what are they?  how do you go about creating them?) I recommend attending this session the next time it is offered.  Both leaders clearly have deep experience and expertise in this field.  Also, the discussion was not tied to a technology (e.g. RDF) so it was applicable regardless of underlying implementation details.

I wrapped up the first day with Richard Ordowich who discussed the process of reverse engineering semantics (meaning) from legacy data.  The goal of such projects being to achieve a data harmonization of information across the enterprise.

A point he stressed was that a business really needs to be ready to start such a journey.  This type of work is very hard and very time consuming.  It requires an enterprise wide discipline.  He suggests that before working with a company on such an initiative one should ask for examples of prior enterprise program success (e.g. something like BPM, SDLC).

Fundamentally, a project that seeks to harmonize the meaning of data across an enterprise requires organization readiness to go beyond project execution.  The enterprise must put effective governance in place to operate and maintain the resulting ontologies, taxonomies and metadata.

The full conference kicked off the following day.  One aspect that jumped out for me was that a lot of the presentations dealt with government-related projects.  This could have been a side-effect of the conference being held in Washington, DC but I think it is more indicative that spending in this technology is more heavily weighted to public rather than private industry.

Being government-centric I found any claims of “value” suspect.  A project can be valuable, or show value, without being cost effective.  Commercial businesses have gone bankrupt even though they delivered value to their customers.  More exposure of positive-ROI commercial projects will be important to help accelerate the adoption of these technologies.

Other than the financial aspect, the presentations were incredibly valuable in terms of presenting lessons learned, best practices and in-depth tool discussions.  I’ll highlight a few of the sessions and key thoughts that I believe will assist as we continue to apply semantic technology to business system challenges.

Read the rest of this entry »

Using ARQoid for Android-based SPARQL Query Execution

December 1st, 2011

I was recently asked about the SPARQL support in Sparql Droid and whether it could serve as a way for other Android applications to execute SPARQL queries against remote data sources.  It could be used in this way but there is a simpler alternative I’d like to discuss here.

On the Android platform it is actually quite easy to execute SPARQL against remote SPARQL endpoints, RDF data and local models.  The heavy lifting is handled by Androjena’s ARQoid, an Android-centric port of HP’s Jena ARQ engine.

Both engines (the original and the port) do a great job of simplifying the execution of SPARQL queries and consumption of the resulting data.  In this post I’ll go through a simple example of using ARQoid.  Note that all the code being shown here is available for download.  This post is based specifically on the queryRemoteSparqlEndpoint() method in the com.monead.androjena.demo.arqoid.SparqlExamples class.

Setup

To begin, some environment setup needs to be done in order to have a properly configured Android project ready to use ARQoid.

First, obtain the ARQoid JAR and its dependencies.  This is easily accomplished using the download page on the ARQoid Wiki and obtaining the latest ARQoid ZIP file.  Unzip the downloaded archive.   Since I’m discussing an Android application I’d expect that you would have created an Android project and that it contains a libs directory where the JAR files should be placed.

Second, add the JAR files to the classpath for your Android project.  I use the ADT plugin for Eclipse to do Android development.  So to add the JARs to my project I choose the Project menu item, select Properties, choose Build Path, select the Libraries tab, click the Add JARs… button, navigate to the libs directory, select the JAR files and click OK on the open dialogs.

Third, setup a minimal Android project.  The default layout, with a small change to its definition will work fine.

Overview

Now we are ready to write the code that uses ARQoid to access some data.  For this first blog entry I’ll focus on a trivial query against a SPARQL endpoint.  There would be some small differences if we wanted to query a local model or a remote data set.  Those will be covered in follow-on entries.

Here is a list of the ARQoid classes we will be using for this initial example:

  • com.hp.hpl.jena.query.Query – represents the query being executed
  • com.hp.hpl.jena.query.Syntax – represents the query syntaxes supported by ARQoid
  • com.hp.hpl.jena.query.QueryFactory – creates a Query instance based on supplied parameters such as the query string and syntax definition
  • com.hp.hpl.jena.query.QueryExecution – provides the service to  execute the query
  • com.hp.hpl.jena.query.QueryExecutionFactory – creates a QueryExecution instance based on supplied parameters such as a Query instance and SPARQL endpoint URI
  • com.hp.hpl.jena.query.ResultSet – represents the returned data and metadata associated with the executed query
  • com.hp.hpl.jena.query.QuerySolution – represents one row of data within the ResultSet.

We’ll use these classes to execute a simple SPARQL query that retrieves some data associated with space exploration.  Talis provides an endpoint that we can use to access some interesting space exploration data.  The endpoint is located at http://api.talis.com/stores/space/services/sparql.
The query we will execute is:

SELECT ?dataType ?data
WHERE {
  <http://nasa.dataincubator.org/launch/1961-012> ?dataType ?data.
}

This query will give us a little information about Vostok 1 launched by the USSR in 1961.

Read the rest of this entry »

The Cognitive Corporation™ – Effective BPM Requires Data Analytics

October 25th, 2011

The Cognitive Corporation is a framework introduced in an earlier posting.  The framework is meant to outline a set of general capabilities that work together in order to support a growing and thinking organization.  For this post I will drill into one of the least mature of those capabilities in terms of enterprise solution adoption – Learn.

Business rules, decision engines, BPM, complex event processing (CEP), these all invoke images of computers making speedy decisions to the benefit of our businesses.  The infrastructure, technologies and software that provide these solutions (SOA, XML schemas, rule engines, workflow engines, etc.) support the decision automation process.  However, they don’t know what decisions to make.

The BPM-related components we acquire provide the how of decision making (send an email, route a claim, suggest an offer).  Learning, supported by data analytics, provides a powerful path to the what and why of automated decisions (send this email to that person because they are at risk of defecting, route this claim to that underwriter because it looks suspicious, suggest this product to that customer because they appear to be buying these types of items).

I’ll start by outlining the high level journey from data to rules and the cyclic nature of that journey.  Data leads to rules, rules beget responses, responses manifest as more data, new data leads to new rules, and so on.  Therefore, the journey does not end with the definition of a set of processes and rules.  This link between updated data and the determination of new processes and rules is the essence of any learning process, providing a key function for the cognitive corporation.

Read the rest of this entry »

Expanding on “Code Reviews Trumps Unit Testing, But They Are Better Together”

October 18th, 2011

Michael Delaney, a senior consulting software engineer at Blue Slate, commented on my previous posting.  As I created a reply I realized that I was expanding on my reasoning and it was becoming a bit long.  So, here is my reply as a follow-up posting.  Also, thank you to Michael for helping me think more about this topic.

I understand the desire to rely on unit testing and its ability to find issues and prevent regressions.  For TDD, I’ll need to write separately.  Fundamentally I’m a believer in white box testing.   Black box approaches, like TDD, seem to be of relatively little value to the overall quality and reliability of the code.  Meaning, I’d want to invest more effort in white box testing than in black box testing.

I’m somewhat jaded, being concerned with the code’s security, which to me is strongly correlated with its reliability.  That said, I believe that unit testing is much more constrained as compared to formal reviews.  I’m not suggesting that unit tests be skipped, rather that we understand that unit tests can catch certain types of flaws and that those types are narrow as compared to what formal reviews can identify.

Read the rest of this entry »

Code Reviews Trump Unit Testing , But They Are Better Together

October 11th, 2011

Last week I was participating in a formal code review (a.k.a. code inspection) with one of our clients.  We have been working with this client, helping them strengthen their development practices.  Holding formal code reviews is a key component for us.  Part of the formal process we introduced includes reviewing the unit testing results, both the (successful) output report and the code coverage metrics.

At one point we were reviewing some code that had several error handling blocks that were not being covered in the unit tests.  These blocks were, arguably, unlikely or impossible to reach (such as a Java StringReader throwing an IOException).  There was some discussion by the team about the necessity of mocking enough functionality to cover these blocks.

Although we agreed that some of the more esoteric error conditions weren’t worth the programmer’s time to mock-up, it occurred to me later that we were missing an important point.  What mattered was that we were holding a formal code review and looking at those blocks of code.

Let me take a step back.  In 1986, Capers Jones published a book entitled Programming Productivity.  Although dated, the book contains many excellent points that cause you think about how to create software in an efficient way.  Here efficiency is not about lines of code per unit of time, but more importantly, lines of correct code per unit of time.  This means taking into account rework due to errors and omissions.

One if the studies presented in the book relates to identifying defects in code.  It is a study whose results seem obvious when we think about them.  However, we don’t always align our software development practices to leverage the study’s lessons and maximize our development efficiency.  Perhaps we believe that the statistics have changed due to language construct, experience, tooling and so forth.  We’d need similar studies to the ones presented by Capers Jones in order to prove that, though.

Below are a few of the actions from the book’s study of defect detection approaches.  I’ve skipped the low end and high-end numbers that Caper’s includes, simply giving the modes (averages) which are a good basis for comparison:

Defect Identification Rates Data Table
Defect Identification Rates Graph

Read the rest of this entry »

The Cognitive Corporation™ – An Introduction

September 26th, 2011

Given my role as an enterprise architect, I’ve had the opportunity to work with many different business leaders, each focused on leveraging IT to drive improved efficiencies, lower costs, increase quality, and broaden market share throughout their businesses.  The improvements might involve any subset of data, processes, business rules, infrastructure, software, hardware, etc.  A common thread is that each project seeks to make the corporation smarter through the use of information technology.

As I’ve placed these separate projects into a common context of my own, I’ve concluded that the long term goal of leveraging information technology must be for it to support cognitive processes.  I don’t mean that the computers will think for us, rather that IT solutions must work together to allow a business to learn, corporately.

The individual tools that we utilize each play a part.  However, we tend to utilize them in a manner that focuses on isolated and directed operation rather than incorporating them into an overall learning loop.  In other words, we install tools that we direct without asking them to help us find better directions to give.

Let me start with a definition: similar to thinking beings, a cognitive corporation™ leverages a feedback loop of information and experiences to inform future processes and rules.  Fundamentally, learning is a process and it involves taking known facts and experiences and combining them to create new hypothesis which are tested in order to derive new facts, processes and rules.  Unfortunately, we don’t often leverage our enterprise applications in this way.

We have many tools available to us in the enterprise IT realm.  These include database management systems, business process management environments, rule engines, reporting tools, content management applications, data analytics tools, complex event processing environments, enterprise service buses, and ETL tools.  Individually, these components are used to solve specific, predefined issues with the operation of a business.  However, this is not an optimal way to leverage them.

If we consider that these tools mimic aspects of an intelligent being, then we need to leverage them in a fashion that manifests the cognitive capability in preference to simply deploying a point-solution.  This involves thinking about the tools somewhat differently.

Read the rest of this entry »

Going Green Means More Green Going?

August 11th, 2011

Readers of my blog may be aware that I own a hybrid car, a 2007 Civic Hybrid to be precise.  I have kept a record of almost every gas purchase, recording the date, accumulated mileage, gallons used, price paid as well as the calculated and claimed MPG.  I thought since I now have four years of data that I could use the data to evaluate the fuel efficiency’s impact on my total cost of ownership (TCO).

I had two questions I wanted to answer: 1) did I achieve the vehicle’s advertised MPG; and is the gas savings significant versus owning a non-hybrid.

To answer the second question I needed to choose an alternate vehicle to represent the non-hybrid.  I thought a good non-hybrid to compare would be the 2007 Civic EX since the features are similar to my car, other than the hybrid engine.

Some caveats, I am not including service visits, new tires or the time value of money in my TCO calculations.

First some basic statistics.  I have driven my car a little over 105,500 miles at this point.  I have used about 2,508 gallons of gas costing me $7,466 over the last four years.  I have had to fill up the car about 290 times.  My mileage over the lifetime of the car has averaged 42 MPG which matches the expected MPG from the original sticker.  Question 1 answered – advertised MPG achieved.

To explore question 2, I needed an average MPG for the EX.  Since traditional cars have different city and highway MPG I had to choose a value that made sense based on my driving, yet be conservative enough to give me a meaningful result.  The 2007 Civic EX had an advertised MPG of 30 city and 38 highway.  I do significantly more highway than city driving, but thought I’d be really conservative and choose 32 MPG for my comparison.

With that assumption in place, I can calculate the gas consumption I would have experienced with the EX.  Over the 105,500 miles I would have used about 3,306 gallons of gas costing about $9,903.

What this means is that if I had purchased the EX in 2007 instead of the Hybrid I would have used about 798 more gallons of gas costing me an additional $2,437 over that time period.  That is good to know, both in terms of my reduced carbon footprint and fuel cost savings.

However, there is a cost difference between the two vehicle purchase prices.  The Hybrid MSRP was $22,600 while the EX was $18,710.  The Hybrid cost me $3,890 more to purchase.

Gas Consumption (Hybrid versus Postulated EX)

Gas Consumption (Hybrid versus Postulated EX)

So over the four years I’ve owned the car, I’m actually currently behind by $1,453 over purchasing the EX (again not considering the time value of money, which would make it worse).  I will need to keep the car for several more years to break even, and in reality it may not be possible to ever break even if I start including the time value factor.   Question 2 answered and it isn’t such good news.

My conclusion is that purchasing a hybrid is not a financially smart choice.  I also wonder if it is even an environmentally sound one given the chemicals involved in manufacturing the battery.  Maybe the environment comes out ahead or maybe not.  I think it is unfortunate that the equation for the consumer doesn’t even hit break even when trying to do the right thing.

Read the rest of this entry »

Android Programming Experiences with Sparql Droid

July 10th, 2011

As I release my 3rd Alpha-version of Sparql Droid I thought I’d document a few lessons learned and open items as I work with the Android environment.  Some of my constraints are based on targeting smart phones rather than tablets, but the lessons learned around development environments, screen layouts, and memory management are valuable.

I’ll start on the development side.  I use Eclipse and the android development plugin is very helpful. It greatly streamlines the development process.  Principally, it automates the generation of the resources from the source files.  These resources, such as screen layouts and menus, require a conversion step after being edited.  The automation, though, comes at a price.

Taking a step back, Android doesn’t use an Oracle-compliant JVM.  Instead it uses the Dalvik VM.  This difference creates two major ramifications: 1) not all the standard packages are available; and 2) any compiled Java code has to go through a step to “align” it for Dalvik. This alignment process is required for class files you create and for any third-party classes (such as those found in external JAR files).  Going back to item 1, if an external JAR file you use needs a package that isn’t part of Dalvik, you’ll need to recreate it.

The alignment process works pretty fast for small projects.  My first application was a game that used no external libraries.  The time required to compile and align was indistinguishable from typical compile time.  However, with Sparql Droid, which uses several large third-party libraries, the alignment time is significant – on the order of a full minute.

That delay doesn’t sound so bad, unless you consider the Build Automatically feature in Eclipse.  This is a feature that you want to turn off when doing Android development that includes third-party libraries of any significance. Turning off that feature simply adds an extra step to the editing process, a manual build, and slightly reduces the convenience of the environment.

With my first Android project, I was able to edit a resource file and immediately jump back to my Java code and have the resource be recognized.   Now I have to manually do a build (waiting a minute or so) after editing a resource file before it is recognized on the code side.  Hopefully the plug-in will be improved to cache the aligned libraries, saving that time when the libraries aren’t being changed.

Read the rest of this entry »

Sparql Droid – A Semantic Technology Application for the Android Platform

June 24th, 2011

Sparql Droid logoThe semantic technology concepts that comprise what is generally called the semantic web involve paradigm shifts in the ways that we represent data, organize information and compute results. Such shifts create opportunities and present challenges.  The opportunities include easier correlation of decentralized information, flexible data relationships and reduced data storage entropy.  The challenges include new data management technology, new syntaxes, and a new separation of data and its relationships.

I am a strong advocate of leveraging semantic technology.  I believe that this new paradigms provide a more flexible basis for our journey to create meaningful, efficient and effective business automation solutions. However, one challenge that differentiates leveraging semantic technology from more common technology (such as relational databases) is the lack of mature tools supporting a business system infrastructure.

It will take a while for solid solutions to appear.  Support for mainstream capabilities such as reporting, BI, workflow, application design and development that all leverage semantic technology are missing or weak at best.  Again, this is an opportunity and a challenge.  For those who enjoy creating computer software it presents a new world of possibilities.  For those looking to leverage mature solutions in order to advance their business vision it will take investment and patience.

In parallel with the semantic paradigm we have an ever increasing focus on mobile-based solutions. Smart phones and tablet devices, focused on network connectivity as the enabler of value, rather than on-board storage and compute power, are becoming the standard tool for human-system interaction.  As we design new solutions we must keep the mobile-accessible mantra in mind.

As part of my exploration of these two technologies, I’ve started working on a semantic technology mobile application called Sparql Droid. Built for the Android platform, my goal is a tool for exploring and mashing semantic data sources.  As a small first-step I’ve leveraged the Androjena port of the Jena framework and created an application with some basic capabilities.

Read the rest of this entry »

OpenOffice in a Heterogeneous Office Tool Environment

March 4th, 2011

A few months ago I blogged about my new computer and my quest to use only OpenOffice as my document tool suite (How I Spent My Christmas Vacation).  For a little over a month I was able to work effectively, exchanging documents and spreadsheets with coworkers without incident.  However, it all came crashing down.  My goal in this blog entry is to describe what worked and what didn’t.

OpenOffice provides five key office-type software packages.  Writer for word processing, Calc for spreadsheets, Impress for presentations, Base for database work and Draw for diagrams.  There is a sixth tool, Math for creating scientific formulas and equations, which is similar to the equation editor available with MS Word.

As one of my coworkers suggests when providing positive and negative feedback, I’ll use the sandwich approach.  If you’ve not heard of this approach, the idea is to start with some good points, then go through the issues and wrap up with a positive item or two.

On a positive note, the OpenOffice suite is production worthy.  For the two tools that seem to be most commonly used in office settings, word processing and spreadsheets, the Writer and Calc tools have all the features that I was used to using with the Microsoft Office (MS Office) tools.  In fact for the most part I was unaware that I was using a different word processor or spreadsheet. From a usability perspective there is little or no learning curve for an experienced MS Office user to effectively use these OpenOffice tools.

Of key importance to me was the ability to work with others who were using MS Office.  The ability for OpenOffice to open the corresponding MS Office documents worked well at first but then cracks began to show.

OpenOffice Writer was able to work with MS Office documents in both the classic Word “doc” format and the newer Word 2007 and later “docx” format.  However, Writer cannot save to the “docx” format.  If you open a “docx” then the only MS Office format that can be used to save the document is the “doc” format.  At first this was a small annoyance but obviously meant that if a “docx” feature was used it would be lost on the export to “doc”.

Another aggravating issue was confusion when using the “Record Changes” feature, which is analogous to the “Track Changes” features in MS Word.  Although the updates made using MS Word could be seen in Writer, notes created in Word were inconsistently presented in Writer.  The tracked changes were also somewhat difficult to understand when multiple iterations of edits had occurred.  At work we often use track changes as we collaborate on documentation so this feature needs to work well for our team.

I eventually ran into two complete show-stoppers.  In the first case, OpenOffice was unable to display certain images embedded in an MS Word document.  Although some images had previously been somewhat distorted, it turned out that certain types of embedded images wouldn’t display at all.  The second issue involved the Impress (presentation) tool.

I’ve mentioned that Writer and Calc are very mature and robust.  The Impress tool doesn’t seem to be as solid.  As I began working with a team member on a presentation we were delivering in February I discovered that there appears to be little compatibility between MS PowerPoint and Impress. I was unable to work with the PowerPoint presentation using Impress.  The images, animations and text were all completely wrong when opened in Impress.

To be fair, I have created standalone presentations using Impress and the tool has a good feature set and works reliably.  I’ve used it to create and deliver presentations with no issues.  OpenOffice even seems to provide a nicer set of boilerplate presentation templates than the ones that come with MS PowerPoint.

My conclusion after working with OpenOffice now for about 3 months is that it is a completely viable solution if used as the document suite for a company. However, it is not possible to succeed with these tools in a heterogeneous environment where documents must be shared with MS Office users.

I will probably continue to use OpenOffice for personal work.  I’ll also continue to upgrade and try using it with MS Office documents from time to time.  Perhaps someday it will be possible to leverage this suite effectively in a multi-platform situation. Certainly from an ROI perspective it becomes harder and harder to justify the cost of the MS Office suite when such a capable and well-designed open source alternative exists.

Have you tried using alternatives to MS Office in a heterogeneous office tool environment?  Have you had better success than I have?  Any pointers on being able to succeed with such an approach?  Is such an approach even reasonable?  Please feel free to share your thoughts.