Dave's Reflections » Software Development

Archive for the ‘Software Development’ Category

MongoDB and Java – Powerful Complementary Platforms

Tuesday, May 31st, 2016

I have found that including MongoDB in the design of Java applications allows me a valuable level of flexibility in meeting client objectives. I have created an initial open source project on GitHub, JavaMongo, with the goal of providing working examples of Java and MongoDB integration. A secondary goal is to include development best practices, such as using testing frameworks and good coding style.

This posting is intended to give a little background on why I find Java and MongoDB to be useful tools in my software development arsenal and then to introduce the JavaMongo project. Future postings will include some videos walking developers through the examples as well as the frameworks being used (like JUnit, Cobertura and Checkstyle)

Background

Java is an ubiquitous platform for creating business applications. It has proven itself across a wide range of use cases from small point-based solutions to large generalized solution stacks. The variety of libraries, frameworks and tools for designing, building, testing and managing Java applications provides significant benefits to companies building solutions using Java. However, an application without ready access to data isn’t particularly useful. As enterprise-scale database options have broadened to include NoSQL, those individuals creating Java-based solutions must be sure to take advantage of new data options in order to benefit from the strengths of such components.

MongoDB is a great NoSQL platform that can be used to provide additional capabilities to your applications. MongoDB is a document store that has proven its reliability, scalability and integrate-ability across numerous small and large-scale applications. Its value and focus complements the way we use relational databases for online transaction-oriented processing (OLTP) and offers advantages over the way we use relational databases for data marts and warehouses.

A point of clarification before proceeding: I’m not here to say that MongoDB is better than some other data product, or, more generally, that document stores are better than relational databases. I find such arguments meaningless without a specific use case or project goal. These technologies are different and have individual strengths and weaknesses in the face of a specific set of project objectives.

I have found that MongoDB plugs in well when I need a place to federate data (structured, semi-structured and unstructured). Given a common platform, it simplifies the work required to build and alter connections between attributes. If you’ve looked at other information about my background you’ll see that I find the use of semantic technology to be incredibly valuable for data federation and classification. MongoDB as a flexible repository plays well with semantics. At the end of this post I’ll give you a small example of that.

JavaMongo Project

The JavaMongo project is intended to provide Java developers with working examples of Java and MongoDB integrations. Over time I expect a variety of common situations to be demonstrated, with associated documentation explaining the use case and the resulting implementation.

In order to have some interesting data to work with, I’m using data sets that my company releases to the public domain. In order to work with the JavaMongo examples you’ll need to import that data into your MongoDB instance. For more information about downloading and importing the sample data, see the discussion on MongoDB Collection of Honeypot Data on my NoSQL topic page .

The initial JavaMongo project contains a basic README file with information on running the example code. Instead of rehashing that information in this post, I’d like to walk through the basic operations being demonstrated in the example code. The main class we’ll explore is BasicStatistics (us.daveread.education.mongo.honeypot.BasicStatistics).

As you know, a Java program starts execution with the main() method. We see that the first step that the BasicStatistics’ main() method takes is to create an instance of the BasicStatictics class.

BasicStatistics Constructor

The constructor code goes through the entire process of connecting to a MongoDB database, accessing a collection and running a query on data in the collection.

First, an instance of MongoClientOptions is created. This class allows us to configure certain client side options related to the connection. I’ll get into more detail with this in future examples. In this case, the program is simply setting the connection timeout to 2000 milliseconds (2 seconds) so that if the instance is not available the program won’t hang for a long time. You wouldn’t make the timeout this short in a production environment but it helps for debugging our local environment by failing fast if something is wrong.

(more…)

Tags: Allegrograph, data, Java, lightweight data federation, MongoDB, programming, semantics
Posted in infuzIT, Java, NoSQL, Semantic Technology, Software Development, Tools and Applications | No Comments »

Initial Time to Build? Vision to Release in Days? Those Aren’t Relevant Measures for Business Agility!

Tuesday, April 15th, 2014

I routinely receive emails, tweets and snail mail from IT vendors that focus on how their solution accelerates the creation of business applications. They will quote executives and technology leaders, citing case studies that compare the time to build an application on their platform versus others. They will make the claim that this speed to release proves that their platform, tool or solution is “better” than the competition. Further, they claim that it will provide similar value for my business’ application needs. The focus of these advertisements is consistently, “how long did it take to initially create some application.”

This speed-to-create metric is pointless for a couple of reasons. First, an experienced developer will be fast when throwing together a solution using his or her preferred tools. Second, an application spends years in maintenance versus the time spent to build its first version.

Build it fast!

Years ago I built applications for GE in C. I was fast. Once I had a good set of libraries, I could build applications for turbine parts catalogs in days. This was before windowing operating systems. There were frameworks from companies like Borland that made it trivial to create an interactive interface. I moved on to Visual Basic and SQLWindows development and was equally fast at creating client-server applications for GE’s field engineering team. I progressed to C++ and created CGI-based web applications. Again, building and deploying applications in days. Java followed, and I created and deployed applications using the leading edge Netscape browser and Java Applets in days and eventually hours for trivial interfaces.

Since 2000 I’ve used BPM and BRM platforms such as PegaRULES, Corticon, Appian and ILOG. I’ve developed applications using frameworks like Struts, JSF, Spring, Hibernate and the list goes on. Through all of this, I’ve lived the euphoria of the initial release and the pain of refactoring for release 2. In my experience not one of these platforms has simplified the refactoring of a weak design without a significant investment of time.

Speed to initial release is not a meaningful measure of a platform’s ability to support business agility. There is little pain in version 1 regardless of the design thought that goes into it. Agility is about versions 2 and beyond. Specifically, we need to understand what planning and practices during prior versions is necessary to promote agility in future versions.

(more…)

Tags: BPM, efficient coding, Information Systems, programming
Posted in BPM, Information Systems, Software Composition, Software Development | No Comments »

Heartbleed – A High-level Look

Saturday, April 12th, 2014

There has been a lot of information flying about on the Internet concerning the Heartbleed vulnerability in the OpenSSL library. Among system administrators and software developers there is a good understanding of exactly what happened, the potential data losses and proper mitigation processes. However, I’ve seen some inaccurate descriptions and discussion in less technical settings.

I thought I would attempt to explain the Heartbleed issue at a high level without focusing on the implementation details. My goal is to help IT and business leaders understand a little bit about how the vulnerability is exploited, why it puts sensitive information at risk and how this relates to their own software development shops.

Heartbleed is a good case study for developers who don’t always worry about data security, feeling that attacks are hard and vulnerabilities are rare. This should serve as a wake-up-call that programs need to be tested in two ways – for use cases and misuse cases. We often focus on use cases, “does the program do what we want it to do?” Less frequently do we test for misuse cases, “does the program do things we don’t want it to do?” We need to do more of the latter.

I’ve created a 10 minute video to walk through Heartbleed. It includes the parable of a “trusting change machine.” The parable is meant to explain the Heartbleed mechanics without requiring that the viewer be an expert in programming or data encryption.

If you have thoughts about ways to clarify concepts like Heartbleed to a wider audience, please feel free to comment. Data security requires cooperation throughout an organization. Effective and accurate communication is vital to achieving that cooperation.

Here are the links mentioned in the video:

SANS Internet Storm Center: https://isc.sans.edu/
SANS Webcast on Heartbleed: http://digital-forensics.sans.org/blog/2014/04/10/heartbleed-links-simulcast-etc
Blue Slate’s Business Security Brief on Heartbleed (Video): http://www.blueslate.net/Dave/BusinessSecurityBrief/20140412-Heartbleed/
OWASP Top 10 Web Vulnerabilities List: https://www.owasp.org/index.php/Category:OWASP_Top_Ten_Project

Tags: application security, data security, Information Systems, Internet, mitigation, programming, Security, Testing, vulnerability
Posted in Information Systems, Security, Software Development, Software Security, Testing | No Comments »

Cognitive Corporation™ Innovation Lab Kickoff!

Friday, August 10th, 2012

I am excited to share the news that Blue Slate Solutions has kicked off a formal innovation program, creating a lab environment which will leverage the Cognitive Corporation™ framework and apply it to a suite of processes, tools and techniques. The lab will use a broad set of enterprise technologies, applying the learning organization concepts implicit in the Cognitive Corporation’s™ feedback loop.

I’ve blogged a couple of times (see references at the end of this blog entry) about the Cognitive Corporation™. The depiction has changed slightly but the fundamentals of the framework are unchanged.

The focus is to create a learning enterprise, where the learning is built into the system integrations and interactions. Enterprises have been investing in these individual components for several years; however they have not truly been integrating them in a way to promote learning.

By “integrating” I mean allowing the system to understand the meaning of the data being passed between them. Creating a screen in a workflow (BPM) system that presents data from a database to a user is not “integration” in my opinion. It is simply passing data around. This prevents the enterprise ecosystem (all the components) from working together and collectively learning.

I liken such connections to my taking a hand-written note in a foreign language, which I don’t understand, and typing the text into an email for someone who does understand the original language. Sure, the recipient can read it, but I, representing the workflow tool passing the information from database (note) to screen (email) in this case, have no idea what the data means and cannot possibly participate in learning from it. Integration requires understanding. Understanding requires defined and agreed-upon semantics.

This is just one of the Cognitive Corporation™ concepts that we will be exploring in the lab environment. We will also be looking at the value of these technologies within different horizontal and vertical domains. Given our expertise in healthcare, finance and insurance, our team is well positioned to use the lab to explore the use of learning BPM in many contexts.

(more…)

Tags: BPM, business rules, cognitive corporation, data, enterprise applications, enterprise systems, Information Systems, linkedin, ontology, programming, semantics
Posted in Architecture, BPM, Business Processes, Business Rules, Cognitive Corporation, Data, Data Analytics, Information Systems, Semantic Technology, Software Composition, Software Development, Tools and Applications | No Comments »

State Selection Lists on Website Forms – How Hard Are They to Sort?

Thursday, June 14th, 2012

This post certainly falls into the “nitpick” category, but the flaw occurs often enough to be somewhat irritating. The problem you ask? Drop-down lists of state names that are not ordered by the state name but instead by the state’s 2-letter postal abbreviation. Granted, this error pales in comparison to applications containing SQL injection flaws or race conditions exposing personal information, but I’m going to complain none-the-less.

What exactly is the issue? Well, it turns out that the two letter postal abbreviations (for example AK for Alaska and HI for Hawaii) can’t be used as the key for sorting the state names into alphabetical order. For the most part it works, however for some states, such as Nevada through New Mexico it breaks. As a New York resident I get tripped up by this.

The image shown here is a web form for a college admissions site. As you can see, Nevada follows New Hampshire, New Jersey and New Mexico but precedes New York. In reality it should follow Nebraska and precede New Hampshire. This order is incorrect; it is based on the state abbreviations. If instead of state names the website were displaying the state abbreviations, the order would be NH, NJ, NM, NV, NY and all would be fine.

The developer(s) of this site are not alone in their mistaken use of the postal abbreviation as the sort key. I’ve encountered this issue with online shopping sites, reservation systems and survey forms. I typically do a quick “view source” of the site and invariably they are using the state abbreviation as the actual value being passed to the server. I’m sure they are using that for sorting as well.

You might think this sort of thing doesn’t matter. From my point of view it represents a “broken window,” using Andy Hunt’s and Dave Thomas’ language from The Pragmatic Programmer. Little things count. Little things left uncorrected form an environment where developers may become more and more sloppy. After all, if I don’t need to pay attention to my sort key for state, what’s to say I won’t make a similar mistake with country or a product list or any other collection of values that is supposed to be ordered to make access easier?

Please, if you are designing an input form, make sure that sorted information displayed by your widgets is sorted by the display value, not some internal code. It will make the use of your form easier for users and garner the respect of your fellow developers.

Have you seen this flaw on websites you’ve visited? Do you have pet peeves with online form designs? I’d enjoy hearing about them.

Tags: linkedin, online forms, programming, quality, sorting, user interface
Posted in Quality, Software Development | No Comments »

Semantic Technology and Business Conference, East 2011 – Reflections

Wednesday, December 7th, 2011

I had the pleasure of attending the Semantic Technology and Business Conference in Washington, DC last week. I have a strong interest in semantic technology and its capabilities to enhance the way in which we leverage information systems. There was a good selection of topics discussed by people with a variety of backgrounds working in different verticals.

To begin the conference I attended the half day “Ontology 101” presented by Elisa Kendall and Deborah McGuinness. They indicated that this presentation has been given at each semantic technology conference and the interest is still strong. The implication being that new people continue to want to understand this art.

Their material was very useful and if you are someone looking to get a grounding in ontologies (what are they? how do you go about creating them?) I recommend attending this session the next time it is offered. Both leaders clearly have deep experience and expertise in this field. Also, the discussion was not tied to a technology (e.g. RDF) so it was applicable regardless of underlying implementation details.

I wrapped up the first day with Richard Ordowich who discussed the process of reverse engineering semantics (meaning) from legacy data. The goal of such projects being to achieve a data harmonization of information across the enterprise.

A point he stressed was that a business really needs to be ready to start such a journey. This type of work is very hard and very time consuming. It requires an enterprise wide discipline. He suggests that before working with a company on such an initiative one should ask for examples of prior enterprise program success (e.g. something like BPM, SDLC).

Fundamentally, a project that seeks to harmonize the meaning of data across an enterprise requires organization readiness to go beyond project execution. The enterprise must put effective governance in place to operate and maintain the resulting ontologies, taxonomies and metadata.

The full conference kicked off the following day. One aspect that jumped out for me was that a lot of the presentations dealt with government-related projects. This could have been a side-effect of the conference being held in Washington, DC but I think it is more indicative that spending in this technology is more heavily weighted to public rather than private industry.

Being government-centric I found any claims of “value” suspect. A project can be valuable, or show value, without being cost effective. Commercial businesses have gone bankrupt even though they delivered value to their customers. More exposure of positive-ROI commercial projects will be important to help accelerate the adoption of these technologies.

Other than the financial aspect, the presentations were incredibly valuable in terms of presenting lessons learned, best practices and in-depth tool discussions. I’ll highlight a few of the sessions and key thoughts that I believe will assist as we continue to apply semantic technology to business system challenges.

(more…)

Tags: data, enterprise systems, Information Systems, linkedin, ontology, semantic web, semantics, system integration
Posted in Information Systems, Semantic Technology, Semantic Web, Software Development, Tools and Applications | 2 Comments »

Using ARQoid for Android-based SPARQL Query Execution

Thursday, December 1st, 2011

I was recently asked about the SPARQL support in Sparql Droid and whether it could serve as a way for other Android applications to execute SPARQL queries against remote data sources. It could be used in this way but there is a simpler alternative I’d like to discuss here.

On the Android platform it is actually quite easy to execute SPARQL against remote SPARQL endpoints, RDF data and local models. The heavy lifting is handled by Androjena’s ARQoid, an Android-centric port of HP’s Jena ARQ engine.

Both engines (the original and the port) do a great job of simplifying the execution of SPARQL queries and consumption of the resulting data. In this post I’ll go through a simple example of using ARQoid. Note that all the code being shown here is available for download. This post is based specifically on the queryRemoteSparqlEndpoint() method in the com.monead.androjena.demo.arqoid.SparqlExamples class.

Setup

To begin, some environment setup needs to be done in order to have a properly configured Android project ready to use ARQoid.

First, obtain the ARQoid JAR and its dependencies. This is easily accomplished using the download page on the ARQoid Wiki and obtaining the latest ARQoid ZIP file. Unzip the downloaded archive. Since I’m discussing an Android application I’d expect that you would have created an Android project and that it contains a libs directory where the JAR files should be placed.

Second, add the JAR files to the classpath for your Android project. I use the ADT plugin for Eclipse to do Android development. So to add the JARs to my project I choose the Project menu item, select Properties, choose Build Path, select the Libraries tab, click the Add JARs… button, navigate to the libs directory, select the JAR files and click OK on the open dialogs.

Third, setup a minimal Android project. The default layout, with a small change to its definition will work fine.

Overview

Now we are ready to write the code that uses ARQoid to access some data. For this first blog entry I’ll focus on a trivial query against a SPARQL endpoint. There would be some small differences if we wanted to query a local model or a remote data set. Those will be covered in follow-on entries.

Here is a list of the ARQoid classes we will be using for this initial example:

com.hp.hpl.jena.query.Query – represents the query being executed
com.hp.hpl.jena.query.Syntax – represents the query syntaxes supported by ARQoid
com.hp.hpl.jena.query.QueryFactory – creates a Query instance based on supplied parameters such as the query string and syntax definition
com.hp.hpl.jena.query.QueryExecution – provides the service to execute the query
com.hp.hpl.jena.query.QueryExecutionFactory – creates a QueryExecution instance based on supplied parameters such as a Query instance and SPARQL endpoint URI
com.hp.hpl.jena.query.ResultSet – represents the returned data and metadata associated with the executed query
com.hp.hpl.jena.query.QuerySolution – represents one row of data within the ResultSet.

We’ll use these classes to execute a simple SPARQL query that retrieves some data associated with space exploration. Talis provides an endpoint that we can use to access some interesting space exploration data. The endpoint is located at http://api.talis.com/stores/space/services/sparql.
The query we will execute is:

SELECT ?dataType ?data
WHERE {
  <http://nasa.dataincubator.org/launch/1961-012> ?dataType ?data.
}

This query will give us a little information about Vostok 1 launched by the USSR in 1961.

(more…)

Tags: Information Systems, Java, linkedin, programming, semantic web, semantics
Posted in Java, Semantic Web, Software Development, Tools and Applications | 1 Comment »

Expanding on “Code Reviews Trumps Unit Testing, But They Are Better Together”

Tuesday, October 18th, 2011

Michael Delaney, a senior consulting software engineer at Blue Slate, commented on my previous posting. As I created a reply I realized that I was expanding on my reasoning and it was becoming a bit long. So, here is my reply as a follow-up posting. Also, thank you to Michael for helping me think more about this topic.

I understand the desire to rely on unit testing and its ability to find issues and prevent regressions. For TDD, I’ll need to write separately. Fundamentally I’m a believer in white box testing. Black box approaches, like TDD, seem to be of relatively little value to the overall quality and reliability of the code. Meaning, I’d want to invest more effort in white box testing than in black box testing.

I’m somewhat jaded, being concerned with the code’s security, which to me is strongly correlated with its reliability. That said, I believe that unit testing is much more constrained as compared to formal reviews. I’m not suggesting that unit tests be skipped, rather that we understand that unit tests can catch certain types of flaws and that those types are narrow as compared to what formal reviews can identify.

(more…)

Tags: efficient coding, Information Systems, linkedin, Managing IS Projects, programming, Testing
Posted in Quality, Software Development, Testing | No Comments »

Code Reviews Trump Unit Testing , But They Are Better Together

Tuesday, October 11th, 2011

Last week I was participating in a formal code review (a.k.a. code inspection) with one of our clients. We have been working with this client, helping them strengthen their development practices. Holding formal code reviews is a key component for us. Part of the formal process we introduced includes reviewing the unit testing results, both the (successful) output report and the code coverage metrics.

At one point we were reviewing some code that had several error handling blocks that were not being covered in the unit tests. These blocks were, arguably, unlikely or impossible to reach (such as a Java StringReader throwing an IOException). There was some discussion by the team about the necessity of mocking enough functionality to cover these blocks.

Although we agreed that some of the more esoteric error conditions weren’t worth the programmer’s time to mock-up, it occurred to me later that we were missing an important point. What mattered was that we were holding a formal code review and looking at those blocks of code.

Let me take a step back. In 1986, Capers Jones published a book entitled Programming Productivity. Although dated, the book contains many excellent points that cause you think about how to create software in an efficient way. Here efficiency is not about lines of code per unit of time, but more importantly, lines of correct code per unit of time. This means taking into account rework due to errors and omissions.

One if the studies presented in the book relates to identifying defects in code. It is a study whose results seem obvious when we think about them. However, we don’t always align our software development practices to leverage the study’s lessons and maximize our development efficiency. Perhaps we believe that the statistics have changed due to language construct, experience, tooling and so forth. We’d need similar studies to the ones presented by Capers Jones in order to prove that, though.

Below are a few of the actions from the book’s study of defect detection approaches. I’ve skipped the low end and high-end numbers that Caper’s includes, simply giving the modes (averages) which are a good basis for comparison:

(more…)

Tags: efficient coding, Information Systems, linkedin, Managing IS Projects, programming, Testing
Posted in Quality, Software Development, Testing | No Comments »

Android Programming Experiences with Sparql Droid

Sunday, July 10th, 2011

As I release my 3rd Alpha-version of Sparql Droid I thought I’d document a few lessons learned and open items as I work with the Android environment. Some of my constraints are based on targeting smart phones rather than tablets, but the lessons learned around development environments, screen layouts, and memory management are valuable.

I’ll start on the development side. I use Eclipse and the android development plugin is very helpful. It greatly streamlines the development process. Principally, it automates the generation of the resources from the source files. These resources, such as screen layouts and menus, require a conversion step after being edited. The automation, though, comes at a price.

Taking a step back, Android doesn’t use an Oracle-compliant JVM. Instead it uses the Dalvik VM. This difference creates two major ramifications: 1) not all the standard packages are available; and 2) any compiled Java code has to go through a step to “align” it for Dalvik. This alignment process is required for class files you create and for any third-party classes (such as those found in external JAR files). Going back to item 1, if an external JAR file you use needs a package that isn’t part of Dalvik, you’ll need to recreate it.

The alignment process works pretty fast for small projects. My first application was a game that used no external libraries. The time required to compile and align was indistinguishable from typical compile time. However, with Sparql Droid, which uses several large third-party libraries, the alignment time is significant – on the order of a full minute.

That delay doesn’t sound so bad, unless you consider the Build Automatically feature in Eclipse. This is a feature that you want to turn off when doing Android development that includes third-party libraries of any significance. Turning off that feature simply adds an extra step to the editing process, a manual build, and slightly reduces the convenience of the environment.

With my first Android project, I was able to edit a resource file and immediately jump back to my Java code and have the resource be recognized. Now I have to manually do a build (waiting a minute or so) after editing a resource file before it is recognized on the code side. Hopefully the plug-in will be improved to cache the aligned libraries, saving that time when the libraries aren’t being changed.

(more…)

Tags: Java, linkedin, ontology, programming, semantic web
Posted in Java, Semantic Web, Software Development, Tools and Applications | No Comments »

David S. Read

Dave's Reflections (Blog)

Archive for the ‘Software Development’ Category

MongoDB and Java – Powerful Complementary Platforms

Background

JavaMongo Project

BasicStatistics Constructor

Initial Time to Build? Vision to Release in Days? Those Aren’t Relevant Measures for Business Agility!

Build it fast!

(more…)

Heartbleed – A High-level Look

Cognitive Corporation™ Innovation Lab Kickoff!

State Selection Lists on Website Forms – How Hard Are They to Sort?

Semantic Technology and Business Conference, East 2011 – Reflections

Using ARQoid for Android-based SPARQL Query Execution

Setup

Overview

(more…)

Expanding on “Code Reviews Trumps Unit Testing, But They Are Better Together”

Code Reviews Trump Unit Testing , But They Are Better Together

Android Programming Experiences with Sparql Droid

Pages

Archives

Categories