Posts Tagged ‘efficient coding’

Database Refactoring and RDF Triples

Wednesday, May 12th, 2010

One of the aspects of agile software development that may lead to significant angst is the database.  Unlike refactoring code, the refactoring of the database schema involves a key constraint – state!  A developer may rearrange code to his or her heart’s content with little worry since the program will start with a blank slate when execution begins.  However, the database “remembers.”  If one accepts that each iteration of an agile process produces a production release then the stored data can’t be deleted as part of the next iteration.

The refactoring of a database becomes less and less trivial as project development continues.  While developers have IDE’s to refactor code, change packages, and alter build targets, there are few tools for refactoring databases.

My definition of a database refactoring tool is one that assists the database developer by remembering the database transformation steps and storing them as part of the project – e.g. part of the build process.  This includes both the schema changes and data transformations.  Remember that the entire team will need to reproduce these steps on local copies of the database.  It must be as easy to incorporate a peer’s database schema changes, without losing data, as it is to incorporate the code changes.

These same data-centric complexities exist in waterfall approaches when going from one version to the next.  Whenever the database structure needs to change, a path to migrate the data has to be defined.  That transformation definition must become part of the project’s artifacts so that the data migration for the new version is supported as the program moves between environments (test, QA, load test, integrated test, and production).  Also, the database transformation steps must be automated and reversible!

That last point, the ability to rollback, is a key part of any rollout plan.  We must be able to back out changes.  It may be that the approach to a rollback is to create a full database backup before implementing the update, but that assumption must be documented and vetted (e.g. the approach of a full backup to support the rollback strategy may not be reasonable in all cases).

This database refactoring issue becomes very tricky when dealing with multiple versions of an application.  The transformation of the database schema and data must be done in a defined order.  As more and more data is stored, the process consumes more storage and processing resources.  This is the ETL side-effect of any system upgrade.  Its impact is simply felt more often (e.g. potentially during each iteration) in an agile project.

As part of exploring semantic technology, I am interested in contrasting this to a database that consists of RDF triples.  The semantic relationships of data do not change as often (if at all) as the relational constructs.  Many times we refactor a relational database as we discover concepts that require one-to-many or many-to-many relationships.

Is an RDF triple-based database easier to refactor than a relational database?  Is there something about the use of RDF triples that reduces the likelihood of a multiplicity change leading to a structural change in the data?  If so, using RDF as the data format could be a technique that simplifies the development of applications.  For now, let’s take a high-level look at a refactoring use case.

(more…)

Business Ontologies and Semantic Technologies Class

Sunday, May 9th, 2010

Last week I had the pleasure of attending Semantic Arts’ training class entitled, “Designing and Building Business Ontologies.”  The course, led by Dave McComb and Simon Robe, provided an excellent introduction to semantic technologies and tools as well as coverage of ontological best practices.  I thoroughly enjoyed the 4-day class and achieved my principle goals in attending; namely to understand the semantic web landscape, including technologies such as RDF, RDFS, OWL, SPARQL, as well as the current state of tools and products in this space.

Both Dave and Simon have a deep understanding of this subject area.  They also work with clients using this technology so they bring real-world examples of where the technology shines and where it has limitations.  I recommend this class to anyone who is seeking to reach a baseline understanding of semantic technologies and ontology strategies.

Why am I so interested in semantic web technology?  I am convinced that structuring information such that it can be consumed by systems, in ways more automated than current data storage and association techniques allow, is required in order to achieve any meaningful advancement in the field of information technology (IT). Whether wiring together web services or setting up ETL jobs to create data marts, too much IT energy is wasted on repeatedly integrating data sources; essentially manually wiring together related information in the absence of the computer being able to wire it together autonomously!

(more…)

Design and Build Effort versus Run-time Efficiency

Saturday, October 17th, 2009

I recently overheard a development leader talking with a team of programmers about the trade-off between the speed of developing working code and the effort required to improve the run-time performance of the code.  His opinion was that it was not worth any extra effort to gain a few hundred milliseconds here or there.  I found myself wanting to debate the position but it was not the right venue.

In my opinion a developer should not write inefficient code just because it is easier.  However, a developer must not tune code without evidence that the tuning effort will make a meaningful improvement to the overall efficiency of the application.  Guessing at where the hotspots are in an application usually leads to a lot of wasted effort.

When I talk about designing and writing efficient code I am really stressing the process of thinking about the macro-level algorithm that is being used.  Considering efficiency (e.g. big-O) and spending some time looking for options that would represent a big-O step change is where design and development performance effort belongs.

For instance, during initial design or coding, it is worth finding an O(log n) alternative to an O(n) solution.  However, spending time searching for a slight improvement in an O(n) algorithm that is still O(n) is likely a waste of time.

Preemptive tuning is a guessing game; we are guessing how a compiler will optimize our code, when a processor will fetch and cache our executable and where the actual hotspots will be.  Unfortunately our guesses are usually wrong. Perhaps the development team lead was really talking about this situation.

The tuning circumstances change once we have an application that can be tested.  The question becomes how far do we go to address performance hotspots?  In other words, how fast is fast enough?  For me the balance being sought is application delivery time versus user productivity and the benefits of tuning can be valuable. (more…)