// JSON-LD for Wordpress Home, Articles and Author Pages. Written by Pete Wailes and Richard Baxter. // See: http://builtvisible.com/implementing-json-ld-wordpress/

Posts Tagged ‘Information Systems’

Using ARQoid for Android-based SPARQL Query Execution

Thursday, December 1st, 2011

I was recently asked about the SPARQL support in Sparql Droid and whether it could serve as a way for other Android applications to execute SPARQL queries against remote data sources.  It could be used in this way but there is a simpler alternative I’d like to discuss here.

On the Android platform it is actually quite easy to execute SPARQL against remote SPARQL endpoints, RDF data and local models.  The heavy lifting is handled by Androjena’s ARQoid, an Android-centric port of HP’s Jena ARQ engine.

Both engines (the original and the port) do a great job of simplifying the execution of SPARQL queries and consumption of the resulting data.  In this post I’ll go through a simple example of using ARQoid.  Note that all the code being shown here is available for download.  This post is based specifically on the queryRemoteSparqlEndpoint() method in the com.monead.androjena.demo.arqoid.SparqlExamples class.

Setup

To begin, some environment setup needs to be done in order to have a properly configured Android project ready to use ARQoid.

First, obtain the ARQoid JAR and its dependencies.  This is easily accomplished using the download page on the ARQoid Wiki and obtaining the latest ARQoid ZIP file.  Unzip the downloaded archive.   Since I’m discussing an Android application I’d expect that you would have created an Android project and that it contains a libs directory where the JAR files should be placed.

Second, add the JAR files to the classpath for your Android project.  I use the ADT plugin for Eclipse to do Android development.  So to add the JARs to my project I choose the Project menu item, select Properties, choose Build Path, select the Libraries tab, click the Add JARs… button, navigate to the libs directory, select the JAR files and click OK on the open dialogs.

Third, setup a minimal Android project.  The default layout, with a small change to its definition will work fine.

Overview

Now we are ready to write the code that uses ARQoid to access some data.  For this first blog entry I’ll focus on a trivial query against a SPARQL endpoint.  There would be some small differences if we wanted to query a local model or a remote data set.  Those will be covered in follow-on entries.

Here is a list of the ARQoid classes we will be using for this initial example:

  • com.hp.hpl.jena.query.Query – represents the query being executed
  • com.hp.hpl.jena.query.Syntax – represents the query syntaxes supported by ARQoid
  • com.hp.hpl.jena.query.QueryFactory – creates a Query instance based on supplied parameters such as the query string and syntax definition
  • com.hp.hpl.jena.query.QueryExecution – provides the service to  execute the query
  • com.hp.hpl.jena.query.QueryExecutionFactory – creates a QueryExecution instance based on supplied parameters such as a Query instance and SPARQL endpoint URI
  • com.hp.hpl.jena.query.ResultSet – represents the returned data and metadata associated with the executed query
  • com.hp.hpl.jena.query.QuerySolution – represents one row of data within the ResultSet.

We’ll use these classes to execute a simple SPARQL query that retrieves some data associated with space exploration.  Talis provides an endpoint that we can use to access some interesting space exploration data.  The endpoint is located at http://api.talis.com/stores/space/services/sparql.
The query we will execute is:

SELECT ?dataType ?data
WHERE {
  <http://nasa.dataincubator.org/launch/1961-012> ?dataType ?data.
}

This query will give us a little information about Vostok 1 launched by the USSR in 1961.

(more…)

The Cognitive Corporation™ – Effective BPM Requires Data Analytics

Tuesday, October 25th, 2011

The Cognitive Corporation is a framework introduced in an earlier posting.  The framework is meant to outline a set of general capabilities that work together in order to support a growing and thinking organization.  For this post I will drill into one of the least mature of those capabilities in terms of enterprise solution adoption – Learn.

Business rules, decision engines, BPM, complex event processing (CEP), these all invoke images of computers making speedy decisions to the benefit of our businesses.  The infrastructure, technologies and software that provide these solutions (SOA, XML schemas, rule engines, workflow engines, etc.) support the decision automation process.  However, they don’t know what decisions to make.

The BPM-related components we acquire provide the how of decision making (send an email, route a claim, suggest an offer).  Learning, supported by data analytics, provides a powerful path to the what and why of automated decisions (send this email to that person because they are at risk of defecting, route this claim to that underwriter because it looks suspicious, suggest this product to that customer because they appear to be buying these types of items).

I’ll start by outlining the high level journey from data to rules and the cyclic nature of that journey.  Data leads to rules, rules beget responses, responses manifest as more data, new data leads to new rules, and so on.  Therefore, the journey does not end with the definition of a set of processes and rules.  This link between updated data and the determination of new processes and rules is the essence of any learning process, providing a key function for the cognitive corporation.

(more…)

Expanding on “Code Reviews Trumps Unit Testing, But They Are Better Together”

Tuesday, October 18th, 2011

Michael Delaney, a senior consulting software engineer at Blue Slate, commented on my previous posting.  As I created a reply I realized that I was expanding on my reasoning and it was becoming a bit long.  So, here is my reply as a follow-up posting.  Also, thank you to Michael for helping me think more about this topic.

I understand the desire to rely on unit testing and its ability to find issues and prevent regressions.  For TDD, I’ll need to write separately.  Fundamentally I’m a believer in white box testing.   Black box approaches, like TDD, seem to be of relatively little value to the overall quality and reliability of the code.  Meaning, I’d want to invest more effort in white box testing than in black box testing.

I’m somewhat jaded, being concerned with the code’s security, which to me is strongly correlated with its reliability.  That said, I believe that unit testing is much more constrained as compared to formal reviews.  I’m not suggesting that unit tests be skipped, rather that we understand that unit tests can catch certain types of flaws and that those types are narrow as compared to what formal reviews can identify.

(more…)

Code Reviews Trump Unit Testing , But They Are Better Together

Tuesday, October 11th, 2011

Last week I was participating in a formal code review (a.k.a. code inspection) with one of our clients.  We have been working with this client, helping them strengthen their development practices.  Holding formal code reviews is a key component for us.  Part of the formal process we introduced includes reviewing the unit testing results, both the (successful) output report and the code coverage metrics.

At one point we were reviewing some code that had several error handling blocks that were not being covered in the unit tests.  These blocks were, arguably, unlikely or impossible to reach (such as a Java StringReader throwing an IOException).  There was some discussion by the team about the necessity of mocking enough functionality to cover these blocks.

Although we agreed that some of the more esoteric error conditions weren’t worth the programmer’s time to mock-up, it occurred to me later that we were missing an important point.  What mattered was that we were holding a formal code review and looking at those blocks of code.

Let me take a step back.  In 1986, Capers Jones published a book entitled Programming Productivity.  Although dated, the book contains many excellent points that cause you think about how to create software in an efficient way.  Here efficiency is not about lines of code per unit of time, but more importantly, lines of correct code per unit of time.  This means taking into account rework due to errors and omissions.

One if the studies presented in the book relates to identifying defects in code.  It is a study whose results seem obvious when we think about them.  However, we don’t always align our software development practices to leverage the study’s lessons and maximize our development efficiency.  Perhaps we believe that the statistics have changed due to language construct, experience, tooling and so forth.  We’d need similar studies to the ones presented by Capers Jones in order to prove that, though.

Below are a few of the actions from the book’s study of defect detection approaches.  I’ve skipped the low end and high-end numbers that Caper’s includes, simply giving the modes (averages) which are a good basis for comparison:

Defect Identification Rates Data Table
Defect Identification Rates Graph

(more…)

The Cognitive Corporation™ – An Introduction

Monday, September 26th, 2011

Given my role as an enterprise architect, I’ve had the opportunity to work with many different business leaders, each focused on leveraging IT to drive improved efficiencies, lower costs, increase quality, and broaden market share throughout their businesses.  The improvements might involve any subset of data, processes, business rules, infrastructure, software, hardware, etc.  A common thread is that each project seeks to make the corporation smarter through the use of information technology.

As I’ve placed these separate projects into a common context of my own, I’ve concluded that the long term goal of leveraging information technology must be for it to support cognitive processes.  I don’t mean that the computers will think for us, rather that IT solutions must work together to allow a business to learn, corporately.

The individual tools that we utilize each play a part.  However, we tend to utilize them in a manner that focuses on isolated and directed operation rather than incorporating them into an overall learning loop.  In other words, we install tools that we direct without asking them to help us find better directions to give.

Let me start with a definition: similar to thinking beings, a cognitive corporation™ leverages a feedback loop of information and experiences to inform future processes and rules.  Fundamentally, learning is a process and it involves taking known facts and experiences and combining them to create new hypothesis which are tested in order to derive new facts, processes and rules.  Unfortunately, we don’t often leverage our enterprise applications in this way.

We have many tools available to us in the enterprise IT realm.  These include database management systems, business process management environments, rule engines, reporting tools, content management applications, data analytics tools, complex event processing environments, enterprise service buses, and ETL tools.  Individually, these components are used to solve specific, predefined issues with the operation of a business.  However, this is not an optimal way to leverage them.

If we consider that these tools mimic aspects of an intelligent being, then we need to leverage them in a fashion that manifests the cognitive capability in preference to simply deploying a point-solution.  This involves thinking about the tools somewhat differently.

(more…)

Fuzzing – A Powerful Technique for Software Security Testing

Friday, January 21st, 2011

I was participating in a code review today and was reminded by a senior architect, who started working as an intern for me years ago, of a testing technique I had used with one of his first programs.  He had been assigned to create a basic web application that collected some data from a user and wrote it to a database.  He came into my office, announced it was done and proudly showed it to me.  I walked over to the keyboard, entered a bunch of junk and got a segmentation fault in response.

Although I didn’t have a name for it, that was a standard technique I used when evaluating applications.  After all, the tried and true paths, expected inputs and easy errors will be tested early and often as the developer exercises the application using the basic use cases.  As Boris Beizer said, “The high-probability paths are always tested if only to demonstrate that the system works properly.” (Beizer, Boris. Software Testing Techniques. Boston, MA: Thomson Computer Press, 1990: 76.)

It is unexpected input that is useful when looking to find untested paths through the code. If someone shows me an application for evaluation the last thing I need to worry about is using it in an expected fashion, everyone else will do that.  In fact, I default to entering data outside the specification when looking at a new application.  I don’t know that my team always appreciates the approach.  They’d probably like to see the application work at least once while I’m in the room.

These days there is a formal name for testing of this type, fuzzing.  A few years ago I preferred calling it “gorilla testing” since I liked the mental picture of beating on the application. (Remember the American Tourister luggage ad in the 1970s?)  But alas, it appears that fuzzing has become the accepted term.

Fuzzing involves passing input that breaks the expected input “rules”.  Those rules could come from some formal requirements, such as a RFC, or informal requirements, such as the set of parameters accepted by an application.  Fuzzing tools can use formal standards, extracted patterns and even randomly generated inputs to test an application’s resilience against unexpected or illegal input.

(more…)

How I Spent My Christmas Vacation

Wednesday, January 5th, 2011

(or Upgrading to Android and Windows 7)

The holidays are usually a time I can use to catch-up on some extra reading or research.  This year I had two major infrastructure changes that occupied my time.  I moved from my Blackberry Storm to an HTC Incredible and from my old Gateway M680 with Windows XP to a Dell Vostro 3700 running Windows 7.  It has been a bumpy couple of weeks getting my virtual life back in order.

Before getting into some of the details of the experiences, I’ll summarize by saying that both upgrades were worth the learning curve and associated frustration.  The Incredible’s hardware and the Android OS are orders-of-magnitude beyond the Storm in terms of usability, reliability, and functionality.  On my computer, Windows 7 (64-bit professional version) provides a clean and efficient environment.  The compatibility with 32-bit applications has worked flawlessly so far.

The phone journey…

I ordered the Incredible with the intention of switching over to it during the week before Christmas.  I would be off from work that week so any issues with email and calendar wouldn’t pose much risk.  However Verizon had other plans.  A day after the Incredible arrived they shut off my Storm.  This meant I had to get the Incredible going immediately.  This was during a week that I was traveling to Alabama and Vermont so I needed my cell phone working reliably.

I was pleasantly surprised at how quickly I was fully operational with the basic services (phone, email and calendar).  Blue Slate uses Google as our hosted email service so its ease of integration with the Android environment isn’t a surprise.  The phone setup process through Verizon has changed since I got my Storm several years ago.  Making on-line changes to my services is now simple.  I quickly expanded my data plan so that I could use the 3G Mobile feature of the Incredible while at the client’s site.  No issues at all!

My main disappointment with the Incredible is its battery life. With my Storm I could go days without recharging.  Now I have to recharge my phone every night.  I’ve gone through the “kill the app” phase and found that process doesn’t really help.  I use WiFi as much as possible since that is supposed to save battery life over using the cell connection to access email and internet services.  I keep the screen dimmed and turn off location services when they are not needed.

On the bright side, the variety of applications, including a nice SSH tool makes the phone amazingly versatile.  I don’t have to fire up my computer to check on a batch job or fix a basic database problem on our Linux servers.  The GPS services surpass my Magellan’s capabilities so I have one less device to carry with me on trips.

All in all I’m very pleased with my move to the Incredible.  I probably would have considered the iPhone but really prefer Verizon’s coverage.  This phone should serve me well for my 2-year contract.

The computer journey…

My new Dell arrived several weeks before Christmas.  I put off doing anything with it, knowing that the process of moving my virtual life, installed and configured over the course of 4 years on my trusty Gateway laptop, would be onerous.  I’m glad I waited.  Although the Dell is a great machine, the process of getting products installed (or obtaining newer versions) and getting files and configurations in place took several days.

(more…)

Semantic Web Summit (East) 2010 Concludes

Thursday, November 18th, 2010

I attended my first semantic web conference this week, the Semantic Web Summit (East) held in Boston.  The focus of the event was how businesses can leverage semantic technologies.  I was interested in what people were actually doing with the technology.  The one and a half days of presentations were informative and diverse.

Our host was Mills Davis, a name that I have encountered frequently during my exploration of the semantic web.  He did a great job of keeping the sessions running on time as well as engaging the audience.  The presentations were generally crisp and clear.  In some cases the speaker presented a product that utilizes semantic concepts, describing its role in the value chain.  In other cases we heard about challenges solved with semantic technologies.

My major takeaways were: 1) semantic technologies work and are being applied to a broad spectrum of problems and 2) the potential business applications of these technologies are vast and ripe for creative minds to explore.  This all bodes well for people delving into semantic technologies since there is an infrastructure of tools and techniques available upon which to build while permitting broad opportunities to benefit from leveraging them.

As a CTO with 20+ years focused on business environments, including application development, enterprise application integration, data warehousing, and business intelligence I identified most closely with the sessions geared around intra-business and B2B uses of semantic technology.  There were other sessions looking a B2C which were well done but not applicable to the world in which I find myself currently working.

Talks by Dennis Wisnosky and Mike Dunn were particularly focused on the business value that can be achieved through the use of semantic technologies.  Further, they helped to define basic best practices that they apply to such projects.  Dennis in particular gave specific information around his processes and architecture while talking about the enormous value that his team achieved.

Heartening to me was the fact that these best practices, processes and architectures are not significantly different than those used with other enterprise system endeavors.  So we don’t need to retool all our understanding of good project management practices and infrastructure design, we just need to internalize where semantic technology best fits into the technology stack.

(more…)

CIO, a Role for Two

Monday, October 11th, 2010

Actors often enjoy the challenge of a role that requires two completely different personas to be presented.  Jekyll and Hyde, Peter Pan’s Captain Hook and Mr. Darling as well as The Prince and the Pauper all give an actor the chance to play two different people within the same role.  In the case of CIOs, they are cast in a role that has a similar theme, requiring two very different mindsets.

For the CIO, this duality is described in a variety of ways.  Sometimes the CIO’s job requirements are discussed as internally and externally focused.  In other cases people separate the responsibilities into infrastructure and business.

Regardless of how the aspects are expressed, there is an understanding that the CIO provides leadership in two different realms. One realm is focused on keeping equipment operating, minimizing maintenance costs, achieving SLAs and allowing the business to derive value from IT investments.  The other realm focuses on business strategy and seeks to derive new functionality in support of improved productivity, customer service, profitability and other corporate measures.

By analogy, the first realm keeps the power flowing while the second creates new devices to plug in and do work.

One could argue that a rethinking of corporate structure might help simplify this situation.  After all, we don’t charge the CFO with maintaining the infrastructure around financial systems, including file cabinets, door locks and computer hardware.  Why should a person charged with exploiting computers for the benefit of the corporation also be charged with the maintenance of the computer hardware and software? Couldn’t the latter responsibility be provided by an operations group, similar to the handling of most utilities?

(more…)

JavaOne 2010 Concludes

Saturday, September 25th, 2010

My last two days at JavaOne 2010 included some interesting sessions as well as spending some time in the pavilion.  I’ll mention a few of the session topics that I found interesting as well as some of the products that I intend to check out.

I attended a session on creating a web architecture focused on high-performance with low-bandwidth.  The speaker was tasked with designing a web-based framework for the government of Ethiopia.  He discussed the challenges that are presented by that country’s infrastructure – consider network speed on the order of 5Kbps between sites.  He also had to work with an IT group that, although educated and intelligent, did not have a lot of depth beyond working with an Oracle database’s features.

His solution allows developers to create fully functional web applications that keep exchanged payloads under 10K.  Although I understand the logic of the approach in this case, I’m not sure the technique would be practical in situations without such severe bandwidth and skill set limitations.

A basic theme during his talk was to keep the data and logic tightly co-located.  In his case it is all located in the database (PL/SQL) but he agreed that it could all be in the application tier (e.g. NoSQL).  I’m not convinced that this is a good approach to creating maintainable high-volume applications.  It could be that the domain of business applications and business verticals in which I often find myself differ from the use cases that are common to developers promoting the removal of tiers from the stack (whether removing the DB server or the mid-tier logic server).

One part of his approach with which I absolutely concur is to push processing onto the client. The use of the client’s CPU seems common sense to me.  The work is around balancing that with security and bandwidth.  However, it can be done and I believe we will continue to find more effective ways to leverage all that computer power.

I also enjoyed a presentation on moving data between a data center and the cloud to perform heavy and intermittent processing.  The presenters did a great job of describing their trials and successes with leveraging the cloud to perform computationally expensive processing on transient data (e.g. they copy the data up each time they run the process rather than pay to store their data).  They also provided a lot of interesting information regarding options, advantages and challenges when leveraging the cloud (Amazon EC2 in this case).

(more…)