// JSON-LD for Wordpress Home, Articles and Author Pages. Written by Pete Wailes and Richard Baxter. // See: http://builtvisible.com/implementing-json-ld-wordpress/

Posts Tagged ‘programming’

Domain Testing at the Unit Level, Part 1: An Introduction

Tuesday, February 1st, 2011

It is surprising how many times I still find myself talking to software teams about unit testing.  I’ve written before that the term “unit testing” is not definitive.  “Unit testing” simply means that tests are being defined that run at the unit level of the code (typically methods or functions).  However, the term doesn’t mean that the tests are meaningful, valuable, or quality-focused.

From what I have seen, the term is often used as a synonym for path or branch level unit testing.  Although these are good places to start, such tests do not form a complete unit test suite.  I argue that the pursuit of 100% path or branch coverage and the exclusion of other types of unit testing is a waste of time. It is better for the overall quality of the code if the unit tests achieve 80% branch coverage and include an effective mix of other unit test types, such as domain, fuzz and security tests.

For the moment I’m going to focus on domain testing.  I think this is an area ripe for improvement.  Extending the “ripe” metaphor, I’d say there is significant low-hanging fruit available to development teams which will allow them to quickly experience the benefits of domain testing.

First, for my purposes in this article what is unit-level domain testing?  Unit-level domain testing is the exercising of program code units (methods, functions) using well-chosen values based on the sets of values grouped, often, by Boolean tests in the code. (Note that the well-chosen values are not completely random.  As we will see, they are constrained by the decision points and logic in the code.)

The provided definition is not meant to be mathematically precise or even receive a passing grade on a comp-sci exam.  In future postings I’ll delve into more of the official theory and terminology.  For now I’m focused on the basic purpose and value of domain testing.

I do need to state an assumption and create two baseline definitions in order to proceed:

Assumption: We are dealing only with integer numeric values and simple Boolean tests involving a variable and a constant.

Definitions:

  • Domain - the set of values included (or excluded) by a Boolean test.  For example, the test, “X > 3” has its domain of matching values 4, 5, 6, … and its domain of non-matching values 3, 2, 1, …
  • Boundary – The constant used in a Boolean test forming the point between the included and excluded sets of values.  So for “X > 3” the boundary value is 3.

Now let’s look at some code.  Here is a simple Java method:

public int add(int op1, int op2) {
    return op1 + op2;
}

This involves one domain, the domain of all integers, sort of.  Looking closely there is an issue; the domain of possible inputs (integers) is not necessarily the domain of possible (correct) outputs.

If two large integers were added together they could produce a value longer than a Java 32-bit integer.  So the output domain is the set of values that can be derived by adding any two integers.  In Java we have the constants MIN_VALUE and MAX_VALUE in the java.lang.Integer class.  Using that vernacular, the domain of all output values for this method can be represented as: MIN_VALUE – MIN_VALUE through MAX_VALUE + MAX_VALUE.

Here is another simple method:

public int divide(int dividend, int divisor) {
    return dividend / divisor;
}

Again we seem to have one domain, the set of all integers.  However we all know there is a problem latent in this code.  Would path testing effectively find it?

(more…)

Fuzzing – A Powerful Technique for Software Security Testing

Friday, January 21st, 2011

I was participating in a code review today and was reminded by a senior architect, who started working as an intern for me years ago, of a testing technique I had used with one of his first programs.  He had been assigned to create a basic web application that collected some data from a user and wrote it to a database.  He came into my office, announced it was done and proudly showed it to me.  I walked over to the keyboard, entered a bunch of junk and got a segmentation fault in response.

Although I didn’t have a name for it, that was a standard technique I used when evaluating applications.  After all, the tried and true paths, expected inputs and easy errors will be tested early and often as the developer exercises the application using the basic use cases.  As Boris Beizer said, “The high-probability paths are always tested if only to demonstrate that the system works properly.” (Beizer, Boris. Software Testing Techniques. Boston, MA: Thomson Computer Press, 1990: 76.)

It is unexpected input that is useful when looking to find untested paths through the code. If someone shows me an application for evaluation the last thing I need to worry about is using it in an expected fashion, everyone else will do that.  In fact, I default to entering data outside the specification when looking at a new application.  I don’t know that my team always appreciates the approach.  They’d probably like to see the application work at least once while I’m in the room.

These days there is a formal name for testing of this type, fuzzing.  A few years ago I preferred calling it “gorilla testing” since I liked the mental picture of beating on the application. (Remember the American Tourister luggage ad in the 1970s?)  But alas, it appears that fuzzing has become the accepted term.

Fuzzing involves passing input that breaks the expected input “rules”.  Those rules could come from some formal requirements, such as a RFC, or informal requirements, such as the set of parameters accepted by an application.  Fuzzing tools can use formal standards, extracted patterns and even randomly generated inputs to test an application’s resilience against unexpected or illegal input.

(more…)

JavaOne 2010 Concludes

Saturday, September 25th, 2010

My last two days at JavaOne 2010 included some interesting sessions as well as spending some time in the pavilion.  I’ll mention a few of the session topics that I found interesting as well as some of the products that I intend to check out.

I attended a session on creating a web architecture focused on high-performance with low-bandwidth.  The speaker was tasked with designing a web-based framework for the government of Ethiopia.  He discussed the challenges that are presented by that country’s infrastructure – consider network speed on the order of 5Kbps between sites.  He also had to work with an IT group that, although educated and intelligent, did not have a lot of depth beyond working with an Oracle database’s features.

His solution allows developers to create fully functional web applications that keep exchanged payloads under 10K.  Although I understand the logic of the approach in this case, I’m not sure the technique would be practical in situations without such severe bandwidth and skill set limitations.

A basic theme during his talk was to keep the data and logic tightly co-located.  In his case it is all located in the database (PL/SQL) but he agreed that it could all be in the application tier (e.g. NoSQL).  I’m not convinced that this is a good approach to creating maintainable high-volume applications.  It could be that the domain of business applications and business verticals in which I often find myself differ from the use cases that are common to developers promoting the removal of tiers from the stack (whether removing the DB server or the mid-tier logic server).

One part of his approach with which I absolutely concur is to push processing onto the client. The use of the client’s CPU seems common sense to me.  The work is around balancing that with security and bandwidth.  However, it can be done and I believe we will continue to find more effective ways to leverage all that computer power.

I also enjoyed a presentation on moving data between a data center and the cloud to perform heavy and intermittent processing.  The presenters did a great job of describing their trials and successes with leveraging the cloud to perform computationally expensive processing on transient data (e.g. they copy the data up each time they run the process rather than pay to store their data).  They also provided a lot of interesting information regarding options, advantages and challenges when leveraging the cloud (Amazon EC2 in this case).

(more…)

JavaOne and Oracle’s OpenWorld 2010 Conference, Initial Thoughts

Wednesday, September 22nd, 2010

I’ve been at Oracle’s combined JavaOne and OpenWorld events for two days.  I am here as both an attendee, learning from a variety of experts, and as a speaker.  Of course this is the first JavaOne since Oracle acquired Sun.  I have been to several JavaOne conferences over the years so I was curious how the event might be different.

One of the first changes that I’ve noticed is that due to the co-location of these two large conferences the venue is very different than when Sun ran JavaOne as a standalone event.  The time between sessions is a full half hour, probably due to the fact that you may find yourself going between venues that are several blocks apart.  I used to think that having getting from Moscone North the Moscone South took a while.   Now I’m walking from the Moscone center to a variety of hotels and back again.  Perhaps this is actually a health regime for programmers!

The new session pre-registration system is interesting. I don’t know if this system has been routine with Oracle’s other conferences but it is new to JavaOne.  Attendees go on-line and pre-register for the sessions they want to attend.  When you show up at the session your badge is scanned.  If you had registered you are allowed in.  If you didn’t preregister and the session is full you have to wait outside the room to see if anyone who registered fails to show up.

I think I like the system, with the assumption that they would stop people from entering when the room was full.  At previous conferences it seemed like popular sessions would just be standing room only, but that was probably a violation of fire codes.  The big advantage of this approach is that it reduces the likelihood of your investing the time to walk to the venue only to find out you can’t get in.  As long as you arranged your schedule on-line and you show up on-time, you’re guaranteed a seat.

Enough about new processes.  After all, I came here to co-present a session and to learn from a variety of others.

Paul Evans and I spoke on the topic of web services and their use with a rules engine. Specifically we were using JAX-WS and Drools.  We also threw in jUDDI to show the value of service location decoupling.  The session was well attended (essentially the room was full) and seemed to keep the attendees’ attention.  We had some good follow-up conversations regarding aspects of the presentation that caught people’s interest, which is always rewarding. The source code for the demonstration program is located at http://bit.ly/blueslate-javaone2010.

Since I am a speaker I have access to both JavaOne and OpenWorld sessions.  I took advantage of that by attending several OpenWorld sessions in addition to a bunch of JavaOne talks.

(more…)

SQL Injection – Why Does Our Profession Continue to Build Applications that Support It?

Monday, August 23rd, 2010

SQL Injection is commonly given as a  root cause when news sites report about stolen data. Here are a few recent headlines for articles describing data loss related to SQL injection: Hackers steal customer data by accessing supermarket database1, Hacker swipes details of 4m Pirate Bay users2, and Mass Web Attack Hits Wall Street Journal, Jerusalem Post3. I understand that SQL injection is prevalent; I just don’t understand why developers continue to write code that offers this avenue to attackers.

From my point of view SQL injection is very well understood and has been for many years. There is no excuse for a programmer to create code that allows for such an attack to succeed. For me this issue falls squarely on the shoulders of people writing applications. If you do not understand the mechanics of SQL injection and don’t know how to effectively prevent it then you shouldn’t be writing software.

The mechanics of SQL injection are very simple. If input from outside an application is incorporated into a SQL statement as literal text, a potential SQL injection vulnerability is created. Specifically, if a parameter value is retrieved from user input and appended into a SQL statement which is then passed on to the RDBMS, the parameter’s value can be set by an attacker to alter the meaning of the original SQL statement.

Note that this attack is not difficult to engineer, complicated to execute or a risk only with web-based applications. There are tools to quickly locate and attack vulnerable applications. Also note that using encrypted channels (e.g. HTTPS) does nothing to prevent this attack. The issue is not related to encrypting the data in transit, rather, it is about keeping the untrusted data away from the backend RDMBS’ interpretation environment.

Here is a simple example of how SQL injection works. Assume we have an application that accepts a last name which will be used to search a database for contact information. The program takes the input, stores it in a variable called lastName, and creates a query:

String sql = "select * from contact_info where lname = '" + lastName + "'";

Now, if an attacker tries the input of: ‘ or 1=1 or ’2′=’

It will create a SQL statement of:

select * from contact_info where lname = '' or 1=1 or '2'=''

This is a legal SQL statement and will retrieve all the rows from the contact_info table. This might expose a lot of data or possibly crash the environment (a denial of service attack). In any case, using other SQL keywords, particularly UNION, the attacker can now explore the database, including other tables and schemas.

(more…)

Creating RDF Triples from a Relational Database

Thursday, August 5th, 2010

In an earlier blog entry I discussed the potential reduction in refactoring effort if our data is represented as RDF triples rather than relational structures.  As a way to give myself easy access to RDF data and to work more with semantic web tool features I have created a program to export relational data to RDF.

The program is really a proof-of-concept.  It takes a SQL query and converts the resulting rows into assertions of triples.  The approach is simple: given a SQL statement and a chosen primary key column (PK) to represent the instance for the exported data, assert triples with the primary key column value as the subject, the column names as the predicates and the non-PK column values as the objects.

Here is a brief sample taken from the documentation accompanying the code.

  • Given a table named people with the following columns and rows:
       id    name    age
       --    ----    ---
       1     Fred    20
       2     Martha  25
  • And a query of:  select id, name, age from people
  • And the primary key column set to: id
  • Then the asserted triples (shown using Turtle and skipping prefixes) will be:
       dsr:PK_1
          a       owl:Thing , dsr:RdbData ;
          rdfs:label "1" ;
          dsr:name "Fred" ;
          dsr:age "20" .

       dsr:PK_2
          a       owl:Thing , dsr:RdbData ;
          rdfs:label "2" ;
          dsr:name "Martha" ;
          dsr:age "25" .

You can see that the approach represents a quick way to convert the data.

(more…)

My First Semantic Web Program

Saturday, June 5th, 2010

I have create my first slightly interesting, to me anyway, program that uses some semantic web technology.  Of course I’ll look back on this in a year and cringe, but for now it represents my understanding of a small set of features from Jena and Pellet.

The basis for the program is an example program that is described in Hebler, Fischer et al’s book “Semantic Web Programming” (ISBN: 047041801X).  The intent of the program is to load an ontology into three models, each running a different level of reasoner (RDF, RDFS and OWL) and output the resulting assertions (triples).

I made a couple of changes to the book’s sample’s approach.  First I allow any supported input file format to be automatically loaded (you don’t have to tell the program what format is being used).  Second, I report the actual differences between the models rather than just showing all the resulting triples.

As I worked on the code, which is currently housed in one uber-class (that’ll have to be refactored!), I realized that there will be lots of reusable “plumbing” code that comes with this type of work.  Setting up models with various reasoners, loading ontologies, reporting triples, interfacing to triple stores, and so on will become nuisance code to write.

Libraries like Jena help, but they abstract at a low level.  I want a semantic workbench that makes playing with the various libraries and frameworks easy.  To that end I’ve created a Sourceforge project called “Semantic Workbench“.

I intend for the Semantic Workbench to provide a GUI environment for manipulating semantic web technologies. Developers and power users would be able to use such a tool to test ontologies, try various reasoners and validate queries.  Developers could use the workbench’s source code to understand how to utilize frameworks like Jena or reasoner APIs like that of Pellet.

I invite other interested people to join the Sourceforge project. The project’s URL is: http://semanticwb.sourceforge.net/

On the data side, in order to have a rich semantic test data set to utilize, I’ve started an ontology that I hope to grow into an interesting example.  I’m using the insurance industry as its basis.  The rules around insurance and the variety of concepts should provide a rich set of classes, attributes and relationships for modeling.  My first version of this example ontology is included with the sample program.

Finally, I’ve added a semantic web section to my website where I’ll maintain links to useful information I find as well as sample code or files that I think might be of interest to other developers.  I’ve placed the sample program and ontology described earlier in this post on that page along with links to a variety of resources.

My site’s semantic web page’s URL is: http://monead.com/semantic/
The URL for the page describing the sample program is: http://monead.com/semantic/proj_diffinferencing.html

Encapsulation, It Isn’t Just For Your Public Interface

Tuesday, March 23rd, 2010

Encapsulation, one of Object Orientation’s (OO) “Big Three” (or four if you include composition), is the concept most often forgotten when I ask an interview candidate to define the key tenants of OO.  Giving the benefit of the doubt, perhaps it is considered “obvious” and hence not necessarily related to OO design in the person’s mind.  Once I bring it up though, there is usually agreement that it is an important aspect to achieving significant value from OO design.

Classically, encapsulation, also called information hiding, “serves to separate the contractual interface of an abstraction and its implementation.”1 The idea is that the user of the functionality only knows about the public interface (contractual interface) and has no knowledge, nor any ability to tie itself, to the implementation.  The implementation includes both data representation and, effectively, algorithm.

Many times I’ll get an alternate definition; essentially the respondent will define encapsulation as, “as an approach which allows an object’s behavior to be called without the caller having knowledge of how the behavior is implemented.”  This seems very close to the classical definition but misses the key point of “contractual interface.”

I could argue that when any method is called the caller doesn’t have knowledge of how the method is implemented; it just gets a value back.  The missing aspect, the key aspect, is the constraint regarding which methods the caller may call, e.g. the public interface.

What started me on this topic was a recent conversation with a peer regarding read-only objects.  Before I get into specifics, let me baseline the traditional encapsulation approach in a typical object.

(more…)