// JSON-LD for Wordpress Home, Articles and Author Pages. Written by Pete Wailes and Richard Baxter. // See: http://builtvisible.com/implementing-json-ld-wordpress/

Archive for the ‘Software Development’ Category

Sparql Droid – A Semantic Technology Application for the Android Platform

Friday, June 24th, 2011

Sparql Droid logoThe semantic technology concepts that comprise what is generally called the semantic web involve paradigm shifts in the ways that we represent data, organize information and compute results. Such shifts create opportunities and present challenges.  The opportunities include easier correlation of decentralized information, flexible data relationships and reduced data storage entropy.  The challenges include new data management technology, new syntaxes, and a new separation of data and its relationships.

I am a strong advocate of leveraging semantic technology.  I believe that this new paradigms provide a more flexible basis for our journey to create meaningful, efficient and effective business automation solutions. However, one challenge that differentiates leveraging semantic technology from more common technology (such as relational databases) is the lack of mature tools supporting a business system infrastructure.

It will take a while for solid solutions to appear.  Support for mainstream capabilities such as reporting, BI, workflow, application design and development that all leverage semantic technology are missing or weak at best.  Again, this is an opportunity and a challenge.  For those who enjoy creating computer software it presents a new world of possibilities.  For those looking to leverage mature solutions in order to advance their business vision it will take investment and patience.

In parallel with the semantic paradigm we have an ever increasing focus on mobile-based solutions. Smart phones and tablet devices, focused on network connectivity as the enabler of value, rather than on-board storage and compute power, are becoming the standard tool for human-system interaction.  As we design new solutions we must keep the mobile-accessible mantra in mind.

As part of my exploration of these two technologies, I’ve started working on a semantic technology mobile application called Sparql Droid. Built for the Android platform, my goal is a tool for exploring and mashing semantic data sources.  As a small first-step I’ve leveraged the Androjena port of the Jena framework and created an application with some basic capabilities.

(more…)

Domain Testing at the Unit Level, Part 1: An Introduction

Tuesday, February 1st, 2011

It is surprising how many times I still find myself talking to software teams about unit testing.  I’ve written before that the term “unit testing” is not definitive.  “Unit testing” simply means that tests are being defined that run at the unit level of the code (typically methods or functions).  However, the term doesn’t mean that the tests are meaningful, valuable, or quality-focused.

From what I have seen, the term is often used as a synonym for path or branch level unit testing.  Although these are good places to start, such tests do not form a complete unit test suite.  I argue that the pursuit of 100% path or branch coverage and the exclusion of other types of unit testing is a waste of time. It is better for the overall quality of the code if the unit tests achieve 80% branch coverage and include an effective mix of other unit test types, such as domain, fuzz and security tests.

For the moment I’m going to focus on domain testing.  I think this is an area ripe for improvement.  Extending the “ripe” metaphor, I’d say there is significant low-hanging fruit available to development teams which will allow them to quickly experience the benefits of domain testing.

First, for my purposes in this article what is unit-level domain testing?  Unit-level domain testing is the exercising of program code units (methods, functions) using well-chosen values based on the sets of values grouped, often, by Boolean tests in the code. (Note that the well-chosen values are not completely random.  As we will see, they are constrained by the decision points and logic in the code.)

The provided definition is not meant to be mathematically precise or even receive a passing grade on a comp-sci exam.  In future postings I’ll delve into more of the official theory and terminology.  For now I’m focused on the basic purpose and value of domain testing.

I do need to state an assumption and create two baseline definitions in order to proceed:

Assumption: We are dealing only with integer numeric values and simple Boolean tests involving a variable and a constant.

Definitions:

  • Domain - the set of values included (or excluded) by a Boolean test.  For example, the test, “X > 3” has its domain of matching values 4, 5, 6, … and its domain of non-matching values 3, 2, 1, …
  • Boundary – The constant used in a Boolean test forming the point between the included and excluded sets of values.  So for “X > 3” the boundary value is 3.

Now let’s look at some code.  Here is a simple Java method:

public int add(int op1, int op2) {
    return op1 + op2;
}

This involves one domain, the domain of all integers, sort of.  Looking closely there is an issue; the domain of possible inputs (integers) is not necessarily the domain of possible (correct) outputs.

If two large integers were added together they could produce a value longer than a Java 32-bit integer.  So the output domain is the set of values that can be derived by adding any two integers.  In Java we have the constants MIN_VALUE and MAX_VALUE in the java.lang.Integer class.  Using that vernacular, the domain of all output values for this method can be represented as: MIN_VALUE – MIN_VALUE through MAX_VALUE + MAX_VALUE.

Here is another simple method:

public int divide(int dividend, int divisor) {
    return dividend / divisor;
}

Again we seem to have one domain, the set of all integers.  However we all know there is a problem latent in this code.  Would path testing effectively find it?

(more…)

Fuzzing – A Powerful Technique for Software Security Testing

Friday, January 21st, 2011

I was participating in a code review today and was reminded by a senior architect, who started working as an intern for me years ago, of a testing technique I had used with one of his first programs.  He had been assigned to create a basic web application that collected some data from a user and wrote it to a database.  He came into my office, announced it was done and proudly showed it to me.  I walked over to the keyboard, entered a bunch of junk and got a segmentation fault in response.

Although I didn’t have a name for it, that was a standard technique I used when evaluating applications.  After all, the tried and true paths, expected inputs and easy errors will be tested early and often as the developer exercises the application using the basic use cases.  As Boris Beizer said, “The high-probability paths are always tested if only to demonstrate that the system works properly.” (Beizer, Boris. Software Testing Techniques. Boston, MA: Thomson Computer Press, 1990: 76.)

It is unexpected input that is useful when looking to find untested paths through the code. If someone shows me an application for evaluation the last thing I need to worry about is using it in an expected fashion, everyone else will do that.  In fact, I default to entering data outside the specification when looking at a new application.  I don’t know that my team always appreciates the approach.  They’d probably like to see the application work at least once while I’m in the room.

These days there is a formal name for testing of this type, fuzzing.  A few years ago I preferred calling it “gorilla testing” since I liked the mental picture of beating on the application. (Remember the American Tourister luggage ad in the 1970s?)  But alas, it appears that fuzzing has become the accepted term.

Fuzzing involves passing input that breaks the expected input “rules”.  Those rules could come from some formal requirements, such as a RFC, or informal requirements, such as the set of parameters accepted by an application.  Fuzzing tools can use formal standards, extracted patterns and even randomly generated inputs to test an application’s resilience against unexpected or illegal input.

(more…)

Creating a SPARQL Endpoint Using Joseki

Monday, November 29th, 2010

Being a consumer of semantic data I thought creating a SPARQL endpoint would be an interesting exercise.  It would require having some data to publish as well as working with a SPARQL library.  For data, I chose a set of mileage information that I have been collecting on my cars for the last 5 years.  For technology, I decided to use the Joseki SPARQL Server, since I was already using Jena.

For those who want to skip the “how” and see the result, the SPARQL endpoint along with sample queries and a link to the ontology and data is at: http://monead.com/semantic/query.html

The first step in this project was to convert my mileage spreadsheets into triples.  I looked briefly for an existing ontology in the automobile domain but didn’t find anything I could use.  I created an ontology that would reflect my approach to recording automobile mileage data.  My data  records the miles traveled between fill-ups as well as the number of gallons used.  I also record the car’s claimed MPG as well as calculating the actual MPG.

The ontology reflects this perspective of calculating the MPG at each fill-up.  This means that the purchase of gas is abstracted to a class with information such as miles traveled, gallons used and date of purchase as attributes.  I abstracted the gas station and location as classes, assuming that over time I might be able to flesh these out (in the spreadsheet I record the name of the station and the town/state).

A trivial Java program converts my spreadsheet (CSV) data into triples matching the ontology.  I then run the ontology and data through Pellet to derive any additional triples from the ontology.  The entire ontology and current data are available at http://monead.com/semantic/data/HybridMileageOntologyAll.Inferenced.xml.

It turns out that the ontology creation and data conversion were the easy parts of this project.  Getting Joseki to work as desired took some time, mostly because I couldn’t find much documentation for deploying it as a servlet rather than using its standalone server feature.  I eventually downloaded the Joseki source in order to understand what was going wrong.  The principle issue is that Joseki doesn’t seem to understand the WAR environment and relative paths (e.g. relative to its own WAR).

I had two major PATH issues: 1) getting Joseki to find its configuration (joseki-config.ttl); and 2) getting Joseki to find the triple store (in this case a flat file).

(more…)

JavaOne 2010 Concludes

Saturday, September 25th, 2010

My last two days at JavaOne 2010 included some interesting sessions as well as spending some time in the pavilion.  I’ll mention a few of the session topics that I found interesting as well as some of the products that I intend to check out.

I attended a session on creating a web architecture focused on high-performance with low-bandwidth.  The speaker was tasked with designing a web-based framework for the government of Ethiopia.  He discussed the challenges that are presented by that country’s infrastructure – consider network speed on the order of 5Kbps between sites.  He also had to work with an IT group that, although educated and intelligent, did not have a lot of depth beyond working with an Oracle database’s features.

His solution allows developers to create fully functional web applications that keep exchanged payloads under 10K.  Although I understand the logic of the approach in this case, I’m not sure the technique would be practical in situations without such severe bandwidth and skill set limitations.

A basic theme during his talk was to keep the data and logic tightly co-located.  In his case it is all located in the database (PL/SQL) but he agreed that it could all be in the application tier (e.g. NoSQL).  I’m not convinced that this is a good approach to creating maintainable high-volume applications.  It could be that the domain of business applications and business verticals in which I often find myself differ from the use cases that are common to developers promoting the removal of tiers from the stack (whether removing the DB server or the mid-tier logic server).

One part of his approach with which I absolutely concur is to push processing onto the client. The use of the client’s CPU seems common sense to me.  The work is around balancing that with security and bandwidth.  However, it can be done and I believe we will continue to find more effective ways to leverage all that computer power.

I also enjoyed a presentation on moving data between a data center and the cloud to perform heavy and intermittent processing.  The presenters did a great job of describing their trials and successes with leveraging the cloud to perform computationally expensive processing on transient data (e.g. they copy the data up each time they run the process rather than pay to store their data).  They also provided a lot of interesting information regarding options, advantages and challenges when leveraging the cloud (Amazon EC2 in this case).

(more…)

JavaOne and Oracle’s OpenWorld 2010 Conference, Initial Thoughts

Wednesday, September 22nd, 2010

I’ve been at Oracle’s combined JavaOne and OpenWorld events for two days.  I am here as both an attendee, learning from a variety of experts, and as a speaker.  Of course this is the first JavaOne since Oracle acquired Sun.  I have been to several JavaOne conferences over the years so I was curious how the event might be different.

One of the first changes that I’ve noticed is that due to the co-location of these two large conferences the venue is very different than when Sun ran JavaOne as a standalone event.  The time between sessions is a full half hour, probably due to the fact that you may find yourself going between venues that are several blocks apart.  I used to think that having getting from Moscone North the Moscone South took a while.   Now I’m walking from the Moscone center to a variety of hotels and back again.  Perhaps this is actually a health regime for programmers!

The new session pre-registration system is interesting. I don’t know if this system has been routine with Oracle’s other conferences but it is new to JavaOne.  Attendees go on-line and pre-register for the sessions they want to attend.  When you show up at the session your badge is scanned.  If you had registered you are allowed in.  If you didn’t preregister and the session is full you have to wait outside the room to see if anyone who registered fails to show up.

I think I like the system, with the assumption that they would stop people from entering when the room was full.  At previous conferences it seemed like popular sessions would just be standing room only, but that was probably a violation of fire codes.  The big advantage of this approach is that it reduces the likelihood of your investing the time to walk to the venue only to find out you can’t get in.  As long as you arranged your schedule on-line and you show up on-time, you’re guaranteed a seat.

Enough about new processes.  After all, I came here to co-present a session and to learn from a variety of others.

Paul Evans and I spoke on the topic of web services and their use with a rules engine. Specifically we were using JAX-WS and Drools.  We also threw in jUDDI to show the value of service location decoupling.  The session was well attended (essentially the room was full) and seemed to keep the attendees’ attention.  We had some good follow-up conversations regarding aspects of the presentation that caught people’s interest, which is always rewarding. The source code for the demonstration program is located at http://bit.ly/blueslate-javaone2010.

Since I am a speaker I have access to both JavaOne and OpenWorld sessions.  I took advantage of that by attending several OpenWorld sessions in addition to a bunch of JavaOne talks.

(more…)

Semantic Workbench, Get It In Gear

Tuesday, September 21st, 2010

I received a helpful push from Paul Evans this evening.  He reminded me that the Semantic Workbench SourceForge project (semanticwb.sourceforge.net) is just sitting idle, waiting to be kicked-off.  We talked about the vision around the project, which needs to be clearly and concisely articulated as a mission.  At that point we’ll have a direction to take.

This conversation coincided with my attendance at two semantic-web presentations at Oracle OpenWorld, which I am able to attend since it is co-located with JavaOne.  I’ll write more about my experiences at this year’s JavaOne conference soon.

These semantic -web presentations validated the value of semantic technologies and the need to make them more visible to the IT community.  For my part, this means I need to do more writing and presenting about semantic technologies while creating a renewed vigor around the Semantic Workbench project.

As Paul and I spoke and I tried to define my vision around the project, I realized that I was being too wordy for a mission statement.  The fundamentals of my depiction were also different from the current project overview on SourceForge.  The overview does not describe the truly useful application that I would like to see come out of the project.

Recognizing this disconnect reinforced the need to come up with a more useful and actionable mission.  In the hopes that the project can be of value, I present this mission statement:

The Semantic Workbench strives to provide a complete Java-based GUI and tool set for exploring, testing, and validating common semantic web-based operations.

(more…)

SQL Injection – Why Does Our Profession Continue to Build Applications that Support It?

Monday, August 23rd, 2010

SQL Injection is commonly given as a  root cause when news sites report about stolen data. Here are a few recent headlines for articles describing data loss related to SQL injection: Hackers steal customer data by accessing supermarket database1, Hacker swipes details of 4m Pirate Bay users2, and Mass Web Attack Hits Wall Street Journal, Jerusalem Post3. I understand that SQL injection is prevalent; I just don’t understand why developers continue to write code that offers this avenue to attackers.

From my point of view SQL injection is very well understood and has been for many years. There is no excuse for a programmer to create code that allows for such an attack to succeed. For me this issue falls squarely on the shoulders of people writing applications. If you do not understand the mechanics of SQL injection and don’t know how to effectively prevent it then you shouldn’t be writing software.

The mechanics of SQL injection are very simple. If input from outside an application is incorporated into a SQL statement as literal text, a potential SQL injection vulnerability is created. Specifically, if a parameter value is retrieved from user input and appended into a SQL statement which is then passed on to the RDBMS, the parameter’s value can be set by an attacker to alter the meaning of the original SQL statement.

Note that this attack is not difficult to engineer, complicated to execute or a risk only with web-based applications. There are tools to quickly locate and attack vulnerable applications. Also note that using encrypted channels (e.g. HTTPS) does nothing to prevent this attack. The issue is not related to encrypting the data in transit, rather, it is about keeping the untrusted data away from the backend RDMBS’ interpretation environment.

Here is a simple example of how SQL injection works. Assume we have an application that accepts a last name which will be used to search a database for contact information. The program takes the input, stores it in a variable called lastName, and creates a query:

String sql = "select * from contact_info where lname = '" + lastName + "'";

Now, if an attacker tries the input of: ‘ or 1=1 or ’2′=’

It will create a SQL statement of:

select * from contact_info where lname = '' or 1=1 or '2'=''

This is a legal SQL statement and will retrieve all the rows from the contact_info table. This might expose a lot of data or possibly crash the environment (a denial of service attack). In any case, using other SQL keywords, particularly UNION, the attacker can now explore the database, including other tables and schemas.

(more…)

Creating RDF Triples from a Relational Database

Thursday, August 5th, 2010

In an earlier blog entry I discussed the potential reduction in refactoring effort if our data is represented as RDF triples rather than relational structures.  As a way to give myself easy access to RDF data and to work more with semantic web tool features I have created a program to export relational data to RDF.

The program is really a proof-of-concept.  It takes a SQL query and converts the resulting rows into assertions of triples.  The approach is simple: given a SQL statement and a chosen primary key column (PK) to represent the instance for the exported data, assert triples with the primary key column value as the subject, the column names as the predicates and the non-PK column values as the objects.

Here is a brief sample taken from the documentation accompanying the code.

  • Given a table named people with the following columns and rows:
       id    name    age
       --    ----    ---
       1     Fred    20
       2     Martha  25
  • And a query of:  select id, name, age from people
  • And the primary key column set to: id
  • Then the asserted triples (shown using Turtle and skipping prefixes) will be:
       dsr:PK_1
          a       owl:Thing , dsr:RdbData ;
          rdfs:label "1" ;
          dsr:name "Fred" ;
          dsr:age "20" .

       dsr:PK_2
          a       owl:Thing , dsr:RdbData ;
          rdfs:label "2" ;
          dsr:name "Martha" ;
          dsr:age "25" .

You can see that the approach represents a quick way to convert the data.

(more…)

Successful Process Automation: A Summary

Monday, July 26th, 2010

InformationWeek Analytics (http://analytics.informationweek.com/index) invited me to write about the subject of process automation.  The article, part of their series covering application architectures, was released in July of this year.  It provided an opportunity for me to articulate the key components that are required to succeed in the automation of business processes.

Both the business and IT are positioned to make-or-break the use of process automation tools and techniques. The business must redefine its processes and operational rules so that work may be automated.  IT must provide the infrastructure and expertise to leverage the tools of the process automation trade.

Starting with the business there must be clearly defined processes by which work gets done.  Each process must be documented, including the points where decisions are made.  The rules for those decisions must then be documented.  Repetitive, low-value and low-risk decisions are immediate candidates for automation.

A key value point that must be reached in order to extract sustainable and meaningful value from process automation is measured in Straight Through Processing (STP).  STP requires that work arrive from a third-party and be automatically processed; returning a final decision and necessary output (letter, claim payment, etc.) without a person being involved in handling the work.

Most businesses begin using process automation tools without achieving any significant STP rate.  This is fine as a starting point so long as the business reviews the manual work, identifies groupings of work, focuses on the largest groupings (large may be based on manual effort, cost or simple volume) and looks to automate the decisions surrounding that group of work.  As STP is achieved for some work, the review process continues as more and more types of work are targeted for automation.

The end goal of process automation is to have people involved in truly exceptional, high-value, high-risk, business decisions.  The business benefits by having people attend to items that truly matter rather than dealing with a large amount background noise that lowers productivity, morale and client satisfaction.

All of this is great in theory but requires an information technology infrastructure that can meet these business objectives.

(more…)