Archive for the ‘Software Development’ Category

Cognitive Corporation™ Innovation Lab Kickoff!

Friday, August 10th, 2012

I am excited to share the news that Blue Slate Solutions has kicked off a formal innovation program, creating a lab environment which will leverage the Cognitive Corporation™ framework and apply it to a suite of processes, tools and techniques.  The lab will use a broad set of enterprise technologies, applying the learning organization concepts implicit in the Cognitive Corporation’s™ feedback loop.

I’ve blogged a couple of times (see references at the end of this blog entry) about the Cognitive Corporation™.  The depiction has changed slightly but the fundamentals of the framework are unchanged.

Cognitive Corporation DepictionThe focus is to create a learning enterprise, where the learning is built into the system integrations and interactions. Enterprises have been investing in these individual components for several years; however they have not truly been integrating them in a way to promote learning.

By “integrating” I mean allowing the system to understand the meaning of the data being passed between them.  Creating a screen in a workflow (BPM) system that presents data from a database to a user is not “integration” in my opinion.  It is simply passing data around.  This prevents the enterprise ecosystem (all the components) from working together and collectively learning.

I liken such connections to my taking a hand-written note in a foreign language, which I don’t understand, and typing the text into an email for someone who does understand the original language.  Sure, the recipient can read it, but I, representing the workflow tool passing the information from database (note) to screen (email) in this case, have no idea what the data means and cannot possibly participate in learning from it.  Integration requires understanding.  Understanding requires defined and agreed-upon semantics.

This is just one of the Cognitive Corporation™ concepts that we will be exploring in the lab environment.  We will also be looking at the value of these technologies within different horizontal and vertical domains.  Given our expertise in healthcare, finance and insurance, our team is well positioned to use the lab to explore the use of learning BPM in many contexts.

(more…)

State Selection Lists on Website Forms – How Hard Are They to Sort?

Thursday, June 14th, 2012

This post certainly falls into the “nitpick” category, but the flaw occurs often enough to be somewhat irritating.  The problem you ask?  Drop-down lists of state names that are not ordered by the state name but instead by the state’s 2-letter postal abbreviation.  Granted, this error pales in comparison to applications containing SQL injection flaws or race conditions exposing personal information, but I’m going to complain none-the-less.

What exactly is the issue?  Well, it turns out that the two letter postal abbreviations (for example AK for Alaska and HI for Hawaii) can’t be used as the key for sorting the state names into alphabetical order.  For the most part it works, however for some states, such as Nevada through New Mexico it breaks.  As a New York resident I get tripped up by this.

Example of Incorrect State SortingThe image shown here is a web form for a college admissions site.  As you can see, Nevada follows New Hampshire, New Jersey and New Mexico but precedes New York.  In reality it should follow Nebraska and precede New Hampshire.  This order is incorrect; it is based on the state abbreviations.  If instead of state names the website were displaying the state abbreviations, the order would be NH, NJ, NM, NV, NY and all would be fine.

The developer(s) of this site are not alone in their mistaken use of the postal abbreviation as the sort key.  I’ve encountered this issue with online shopping sites, reservation systems and survey forms.  I typically do a quick “view source” of the site and invariably they are using the state abbreviation as the actual value being passed to the server.  I’m sure they are using that for sorting as well.

You might think this sort of thing doesn’t matter.  From my point of view it represents a “broken window,” using Andy Hunt’s and Dave Thomas’ language from The Pragmatic Programmer.  Little things count.  Little things left uncorrected form an environment where developers may become more and more sloppy.  After all, if I don’t need to pay attention to my sort key for state, what’s to say I won’t make a similar mistake with country or a product list or any other collection of values that is supposed to be ordered to make access easier?

Please, if you are designing an input form, make sure that sorted information displayed by your widgets is sorted by the display value, not some internal code.  It will make the use of your form easier for users and garner the respect of your fellow developers.

Have you seen this flaw on websites you’ve visited?  Do you have pet peeves with online form designs?  I’d enjoy hearing about them.

Semantic Technology and Business Conference, East 2011 – Reflections

Wednesday, December 7th, 2011

I had the pleasure of attending the Semantic Technology and Business Conference in Washington, DC last week.  I have a strong interest in semantic technology and its capabilities to enhance the way in which we leverage information systems.  There was a good selection of topics discussed by people with a variety of  backgrounds working in different verticals.

To begin the conference I attended the half day “Ontology 101” presented by Elisa Kendall and Deborah McGuinness.  They indicated that this presentation has been given at each semantic technology conference and the interest is still strong.  The implication being that new people continue to want to understand this art.

Their material was very useful and if you are someone looking to get a grounding in ontologies (what are they?  how do you go about creating them?) I recommend attending this session the next time it is offered.  Both leaders clearly have deep experience and expertise in this field.  Also, the discussion was not tied to a technology (e.g. RDF) so it was applicable regardless of underlying implementation details.

I wrapped up the first day with Richard Ordowich who discussed the process of reverse engineering semantics (meaning) from legacy data.  The goal of such projects being to achieve a data harmonization of information across the enterprise.

A point he stressed was that a business really needs to be ready to start such a journey.  This type of work is very hard and very time consuming.  It requires an enterprise wide discipline.  He suggests that before working with a company on such an initiative one should ask for examples of prior enterprise program success (e.g. something like BPM, SDLC).

Fundamentally, a project that seeks to harmonize the meaning of data across an enterprise requires organization readiness to go beyond project execution.  The enterprise must put effective governance in place to operate and maintain the resulting ontologies, taxonomies and metadata.

The full conference kicked off the following day.  One aspect that jumped out for me was that a lot of the presentations dealt with government-related projects.  This could have been a side-effect of the conference being held in Washington, DC but I think it is more indicative that spending in this technology is more heavily weighted to public rather than private industry.

Being government-centric I found any claims of “value” suspect.  A project can be valuable, or show value, without being cost effective.  Commercial businesses have gone bankrupt even though they delivered value to their customers.  More exposure of positive-ROI commercial projects will be important to help accelerate the adoption of these technologies.

Other than the financial aspect, the presentations were incredibly valuable in terms of presenting lessons learned, best practices and in-depth tool discussions.  I’ll highlight a few of the sessions and key thoughts that I believe will assist as we continue to apply semantic technology to business system challenges.

(more…)

Using ARQoid for Android-based SPARQL Query Execution

Thursday, December 1st, 2011

I was recently asked about the SPARQL support in Sparql Droid and whether it could serve as a way for other Android applications to execute SPARQL queries against remote data sources.  It could be used in this way but there is a simpler alternative I’d like to discuss here.

On the Android platform it is actually quite easy to execute SPARQL against remote SPARQL endpoints, RDF data and local models.  The heavy lifting is handled by Androjena’s ARQoid, an Android-centric port of HP’s Jena ARQ engine.

Both engines (the original and the port) do a great job of simplifying the execution of SPARQL queries and consumption of the resulting data.  In this post I’ll go through a simple example of using ARQoid.  Note that all the code being shown here is available for download.  This post is based specifically on the queryRemoteSparqlEndpoint() method in the com.monead.androjena.demo.arqoid.SparqlExamples class.

Setup

To begin, some environment setup needs to be done in order to have a properly configured Android project ready to use ARQoid.

First, obtain the ARQoid JAR and its dependencies.  This is easily accomplished using the download page on the ARQoid Wiki and obtaining the latest ARQoid ZIP file.  Unzip the downloaded archive.   Since I’m discussing an Android application I’d expect that you would have created an Android project and that it contains a libs directory where the JAR files should be placed.

Second, add the JAR files to the classpath for your Android project.  I use the ADT plugin for Eclipse to do Android development.  So to add the JARs to my project I choose the Project menu item, select Properties, choose Build Path, select the Libraries tab, click the Add JARs… button, navigate to the libs directory, select the JAR files and click OK on the open dialogs.

Third, setup a minimal Android project.  The default layout, with a small change to its definition will work fine.

Overview

Now we are ready to write the code that uses ARQoid to access some data.  For this first blog entry I’ll focus on a trivial query against a SPARQL endpoint.  There would be some small differences if we wanted to query a local model or a remote data set.  Those will be covered in follow-on entries.

Here is a list of the ARQoid classes we will be using for this initial example:

  • com.hp.hpl.jena.query.Query – represents the query being executed
  • com.hp.hpl.jena.query.Syntax – represents the query syntaxes supported by ARQoid
  • com.hp.hpl.jena.query.QueryFactory – creates a Query instance based on supplied parameters such as the query string and syntax definition
  • com.hp.hpl.jena.query.QueryExecution – provides the service to  execute the query
  • com.hp.hpl.jena.query.QueryExecutionFactory – creates a QueryExecution instance based on supplied parameters such as a Query instance and SPARQL endpoint URI
  • com.hp.hpl.jena.query.ResultSet – represents the returned data and metadata associated with the executed query
  • com.hp.hpl.jena.query.QuerySolution – represents one row of data within the ResultSet.

We’ll use these classes to execute a simple SPARQL query that retrieves some data associated with space exploration.  Talis provides an endpoint that we can use to access some interesting space exploration data.  The endpoint is located at http://api.talis.com/stores/space/services/sparql.
The query we will execute is:

SELECT ?dataType ?data
WHERE {
  <http://nasa.dataincubator.org/launch/1961-012> ?dataType ?data.
}

This query will give us a little information about Vostok 1 launched by the USSR in 1961.

(more…)

Expanding on “Code Reviews Trumps Unit Testing, But They Are Better Together”

Tuesday, October 18th, 2011

Michael Delaney, a senior consulting software engineer at Blue Slate, commented on my previous posting.  As I created a reply I realized that I was expanding on my reasoning and it was becoming a bit long.  So, here is my reply as a follow-up posting.  Also, thank you to Michael for helping me think more about this topic.

I understand the desire to rely on unit testing and its ability to find issues and prevent regressions.  For TDD, I’ll need to write separately.  Fundamentally I’m a believer in white box testing.   Black box approaches, like TDD, seem to be of relatively little value to the overall quality and reliability of the code.  Meaning, I’d want to invest more effort in white box testing than in black box testing.

I’m somewhat jaded, being concerned with the code’s security, which to me is strongly correlated with its reliability.  That said, I believe that unit testing is much more constrained as compared to formal reviews.  I’m not suggesting that unit tests be skipped, rather that we understand that unit tests can catch certain types of flaws and that those types are narrow as compared to what formal reviews can identify.

(more…)

Code Reviews Trump Unit Testing , But They Are Better Together

Tuesday, October 11th, 2011

Last week I was participating in a formal code review (a.k.a. code inspection) with one of our clients.  We have been working with this client, helping them strengthen their development practices.  Holding formal code reviews is a key component for us.  Part of the formal process we introduced includes reviewing the unit testing results, both the (successful) output report and the code coverage metrics.

At one point we were reviewing some code that had several error handling blocks that were not being covered in the unit tests.  These blocks were, arguably, unlikely or impossible to reach (such as a Java StringReader throwing an IOException).  There was some discussion by the team about the necessity of mocking enough functionality to cover these blocks.

Although we agreed that some of the more esoteric error conditions weren’t worth the programmer’s time to mock-up, it occurred to me later that we were missing an important point.  What mattered was that we were holding a formal code review and looking at those blocks of code.

Let me take a step back.  In 1986, Capers Jones published a book entitled Programming Productivity.  Although dated, the book contains many excellent points that cause you think about how to create software in an efficient way.  Here efficiency is not about lines of code per unit of time, but more importantly, lines of correct code per unit of time.  This means taking into account rework due to errors and omissions.

One if the studies presented in the book relates to identifying defects in code.  It is a study whose results seem obvious when we think about them.  However, we don’t always align our software development practices to leverage the study’s lessons and maximize our development efficiency.  Perhaps we believe that the statistics have changed due to language construct, experience, tooling and so forth.  We’d need similar studies to the ones presented by Capers Jones in order to prove that, though.

Below are a few of the actions from the book’s study of defect detection approaches.  I’ve skipped the low end and high-end numbers that Caper’s includes, simply giving the modes (averages) which are a good basis for comparison:

Defect Identification Rates Data Table
Defect Identification Rates Graph

(more…)

Android Programming Experiences with Sparql Droid

Sunday, July 10th, 2011

As I release my 3rd Alpha-version of Sparql Droid I thought I’d document a few lessons learned and open items as I work with the Android environment.  Some of my constraints are based on targeting smart phones rather than tablets, but the lessons learned around development environments, screen layouts, and memory management are valuable.

I’ll start on the development side.  I use Eclipse and the android development plugin is very helpful. It greatly streamlines the development process.  Principally, it automates the generation of the resources from the source files.  These resources, such as screen layouts and menus, require a conversion step after being edited.  The automation, though, comes at a price.

Taking a step back, Android doesn’t use an Oracle-compliant JVM.  Instead it uses the Dalvik VM.  This difference creates two major ramifications: 1) not all the standard packages are available; and 2) any compiled Java code has to go through a step to “align” it for Dalvik. This alignment process is required for class files you create and for any third-party classes (such as those found in external JAR files).  Going back to item 1, if an external JAR file you use needs a package that isn’t part of Dalvik, you’ll need to recreate it.

The alignment process works pretty fast for small projects.  My first application was a game that used no external libraries.  The time required to compile and align was indistinguishable from typical compile time.  However, with Sparql Droid, which uses several large third-party libraries, the alignment time is significant – on the order of a full minute.

That delay doesn’t sound so bad, unless you consider the Build Automatically feature in Eclipse.  This is a feature that you want to turn off when doing Android development that includes third-party libraries of any significance. Turning off that feature simply adds an extra step to the editing process, a manual build, and slightly reduces the convenience of the environment.

With my first Android project, I was able to edit a resource file and immediately jump back to my Java code and have the resource be recognized.   Now I have to manually do a build (waiting a minute or so) after editing a resource file before it is recognized on the code side.  Hopefully the plug-in will be improved to cache the aligned libraries, saving that time when the libraries aren’t being changed.

(more…)

Sparql Droid – A Semantic Technology Application for the Android Platform

Friday, June 24th, 2011

Sparql Droid logoThe semantic technology concepts that comprise what is generally called the semantic web involve paradigm shifts in the ways that we represent data, organize information and compute results. Such shifts create opportunities and present challenges.  The opportunities include easier correlation of decentralized information, flexible data relationships and reduced data storage entropy.  The challenges include new data management technology, new syntaxes, and a new separation of data and its relationships.

I am a strong advocate of leveraging semantic technology.  I believe that this new paradigms provide a more flexible basis for our journey to create meaningful, efficient and effective business automation solutions. However, one challenge that differentiates leveraging semantic technology from more common technology (such as relational databases) is the lack of mature tools supporting a business system infrastructure.

It will take a while for solid solutions to appear.  Support for mainstream capabilities such as reporting, BI, workflow, application design and development that all leverage semantic technology are missing or weak at best.  Again, this is an opportunity and a challenge.  For those who enjoy creating computer software it presents a new world of possibilities.  For those looking to leverage mature solutions in order to advance their business vision it will take investment and patience.

In parallel with the semantic paradigm we have an ever increasing focus on mobile-based solutions. Smart phones and tablet devices, focused on network connectivity as the enabler of value, rather than on-board storage and compute power, are becoming the standard tool for human-system interaction.  As we design new solutions we must keep the mobile-accessible mantra in mind.

As part of my exploration of these two technologies, I’ve started working on a semantic technology mobile application called Sparql Droid. Built for the Android platform, my goal is a tool for exploring and mashing semantic data sources.  As a small first-step I’ve leveraged the Androjena port of the Jena framework and created an application with some basic capabilities.

(more…)

Domain Testing at the Unit Level, Part 1: An Introduction

Tuesday, February 1st, 2011

It is surprising how many times I still find myself talking to software teams about unit testing.  I’ve written before that the term “unit testing” is not definitive.  “Unit testing” simply means that tests are being defined that run at the unit level of the code (typically methods or functions).  However, the term doesn’t mean that the tests are meaningful, valuable, or quality-focused.

From what I have seen, the term is often used as a synonym for path or branch level unit testing.  Although these are good places to start, such tests do not form a complete unit test suite.  I argue that the pursuit of 100% path or branch coverage and the exclusion of other types of unit testing is a waste of time. It is better for the overall quality of the code if the unit tests achieve 80% branch coverage and include an effective mix of other unit test types, such as domain, fuzz and security tests.

For the moment I’m going to focus on domain testing.  I think this is an area ripe for improvement.  Extending the “ripe” metaphor, I’d say there is significant low-hanging fruit available to development teams which will allow them to quickly experience the benefits of domain testing.

First, for my purposes in this article what is unit-level domain testing?  Unit-level domain testing is the exercising of program code units (methods, functions) using well-chosen values based on the sets of values grouped, often, by Boolean tests in the code. (Note that the well-chosen values are not completely random.  As we will see, they are constrained by the decision points and logic in the code.)

The provided definition is not meant to be mathematically precise or even receive a passing grade on a comp-sci exam.  In future postings I’ll delve into more of the official theory and terminology.  For now I’m focused on the basic purpose and value of domain testing.

I do need to state an assumption and create two baseline definitions in order to proceed:

Assumption: We are dealing only with integer numeric values and simple Boolean tests involving a variable and a constant.

Definitions:

  • Domain - the set of values included (or excluded) by a Boolean test.  For example, the test, “X > 3” has its domain of matching values 4, 5, 6, … and its domain of non-matching values 3, 2, 1, …
  • Boundary – The constant used in a Boolean test forming the point between the included and excluded sets of values.  So for “X > 3” the boundary value is 3.

Now let’s look at some code.  Here is a simple Java method:

public int add(int op1, int op2) {
    return op1 + op2;
}

This involves one domain, the domain of all integers, sort of.  Looking closely there is an issue; the domain of possible inputs (integers) is not necessarily the domain of possible (correct) outputs.

If two large integers were added together they could produce a value longer than a Java 32-bit integer.  So the output domain is the set of values that can be derived by adding any two integers.  In Java we have the constants MIN_VALUE and MAX_VALUE in the java.lang.Integer class.  Using that vernacular, the domain of all output values for this method can be represented as: MIN_VALUE – MIN_VALUE through MAX_VALUE + MAX_VALUE.

Here is another simple method:

public int divide(int dividend, int divisor) {
    return dividend / divisor;
}

Again we seem to have one domain, the set of all integers.  However we all know there is a problem latent in this code.  Would path testing effectively find it?

(more…)

Fuzzing – A Powerful Technique for Software Security Testing

Friday, January 21st, 2011

I was participating in a code review today and was reminded by a senior architect, who started working as an intern for me years ago, of a testing technique I had used with one of his first programs.  He had been assigned to create a basic web application that collected some data from a user and wrote it to a database.  He came into my office, announced it was done and proudly showed it to me.  I walked over to the keyboard, entered a bunch of junk and got a segmentation fault in response.

Although I didn’t have a name for it, that was a standard technique I used when evaluating applications.  After all, the tried and true paths, expected inputs and easy errors will be tested early and often as the developer exercises the application using the basic use cases.  As Boris Beizer said, “The high-probability paths are always tested if only to demonstrate that the system works properly.” (Beizer, Boris. Software Testing Techniques. Boston, MA: Thomson Computer Press, 1990: 76.)

It is unexpected input that is useful when looking to find untested paths through the code. If someone shows me an application for evaluation the last thing I need to worry about is using it in an expected fashion, everyone else will do that.  In fact, I default to entering data outside the specification when looking at a new application.  I don’t know that my team always appreciates the approach.  They’d probably like to see the application work at least once while I’m in the room.

These days there is a formal name for testing of this type, fuzzing.  A few years ago I preferred calling it “gorilla testing” since I liked the mental picture of beating on the application. (Remember the American Tourister luggage ad in the 1970s?)  But alas, it appears that fuzzing has become the accepted term.

Fuzzing involves passing input that breaks the expected input “rules”.  Those rules could come from some formal requirements, such as a RFC, or informal requirements, such as the set of parameters accepted by an application.  Fuzzing tools can use formal standards, extracted patterns and even randomly generated inputs to test an application’s resilience against unexpected or illegal input.

(more…)