// JSON-LD for Wordpress Home, Articles and Author Pages. Written by Pete Wailes and Richard Baxter. // See: http://builtvisible.com/implementing-json-ld-wordpress/

MongoDB and Java – Powerful Complementary Platforms

I have found that including MongoDB in the design of Java applications allows me a valuable level of flexibility in meeting client objectives. I have created an initial open source project on GitHub, JavaMongo, with the goal of providing working examples of Java and MongoDB integration. A secondary goal is to include development best practices, such as using testing frameworks and good coding style.

This posting is intended to give a little background on why I find Java and MongoDB to be useful tools in my software development arsenal and then to introduce the JavaMongo project. Future postings will include some videos walking developers through the examples as well as the frameworks being used (like JUnit, Cobertura and Checkstyle)

Background

Java is an ubiquitous platform for creating business applications. It has proven itself across a wide range of use cases from small point-based solutions to large generalized solution stacks. The variety of libraries, frameworks and tools for designing, building, testing and managing Java applications provides significant benefits to companies building solutions using Java. However, an application without ready access to data isn’t particularly useful. As enterprise-scale database options have broadened to include NoSQL, those individuals creating Java-based solutions must be sure to take advantage of new data options in order to benefit from the strengths of such components.

MongoDB is a great NoSQL platform that can be used to provide additional capabilities to your applications. MongoDB is a document store that has proven its reliability, scalability and integrate-ability across numerous small and large-scale applications. Its value and focus complements the way we use relational databases for online transaction-oriented processing (OLTP) and offers advantages over the way we use relational databases for data marts and warehouses.

A point of clarification before proceeding: I’m not here to say that MongoDB is better than some other data product, or, more generally, that document stores are better than relational databases. I find such arguments meaningless without a specific use case or project goal. These technologies are different and have individual strengths and weaknesses in the face of a specific set of project objectives.

I have found that MongoDB plugs in well when I need a place to federate data (structured, semi-structured and unstructured). Given a common platform, it simplifies the work required to build and alter connections between attributes. If you’ve looked at other information about my background you’ll see that I find the use of semantic technology to be incredibly valuable for data federation and classification. MongoDB as a flexible repository plays well with semantics. At the end of this post I’ll give you a small example of that.

JavaMongo Project

The JavaMongo project is intended to provide Java developers with working examples of Java and MongoDB integrations. Over time I expect a variety of common situations to be demonstrated, with associated documentation explaining the use case and the resulting implementation.

In order to have some interesting data to work with, I’m using data sets that my company releases to the public domain. In order to work with the JavaMongo examples you’ll need to import that data into your MongoDB instance. For more information about downloading and importing the sample data, see the discussion on MongoDB Collection of Honeypot Data on my NoSQL topic page.

The initial JavaMongo project contains a basic README file with information on running the example code. Instead of rehashing that information in this post, I’d like to walk through the basic operations being demonstrated in the example code. The main class we’ll explore is BasicStatistics (us.daveread.education.mongo.honeypot.BasicStatistics).

As you know, a Java program starts execution with the main() method. We see that the first step that the BasicStatistics’ main() method takes is to create an instance of the BasicStatictics class.

BasicStatistics Constructor

The constructor code goes through the entire process of connecting to a MongoDB database, accessing a collection and running a query on data in the collection.

First, an instance of MongoClientOptions is created. This class allows us to configure certain client side options related to the connection. I’ll get into more detail with this in future examples. In this case, the program is simply setting the connection timeout to 2000 milliseconds (2 seconds) so that if the instance is not available the program won’t hang for a long time. You wouldn’t make the timeout this short in a production environment but it helps for debugging our local environment by failing fast if something is wrong.

Next, an instance of the MongoClient class is created and assigned to an instance variable called mongoClient. MongoClient instances represent connections to a MongoDB server. There are several MongoClient constructors. In this case a ServerAddress instance and the above created MongoClientOptions instance are being provided. As you can see, the ServerAddress needs the IP address (or DNS name) and the port number for the MongoDB instance we are connecting to.

Now that we have a connection to the MongoDB server, we need to access a specific database. The MongoClient class supplies the method, getDatabase() for this purpose. We supply the name of the database as a parameter. The method returns an object of type MongoDatabase which the example program stores in the attribute named mongoDatabase.

In MongoDB, databases contain collections, each of which contains our documents (data). In order to access a specific collection we use the getCollection() method on the MongoDatabase class. We pass the collection name to the method and it returns an instance of MongoCollection. A MongoCollection instance is simply the represenation of a connection to a specific collection. Using this instance we can access the documents within the collection. Note that in the example code we define the generic type for the collection to be <Document>. MongoDB collections are collections of documents.

Once we have a MongoCollection instance, we can use several methods to query the documents. MongoDB has powerful query and aggregation capabilities. We’ll show off a few examples in the BasicStatistics class just to give you a flavor.

To wrap up discussing the operations in the constructor, we see that the program obtains the count of documents using the MongoCollection class’ count() method. Finally the constructor makes a call to the computeAttackCountryCount() method.

computeAttackCountryCount() Method

This method uses the mongoDatabase instance attribute, assigned in the constructor, to access the honeypot collection and create a HashMap of all the county codes found in the data. This method shows one of the ways we can work with results from a MongoDB query.

To execute the query we call the find() method on the MongoCollection, passing a query. The query is a Document defined with the filtering and projection options we want to use. The find() method returns a FindIterable instance. We’ll look at FindIterable shortly. However, one of the methods we can call on a FindIterable is into().

The into() method places all of the results in a Java collection (we use an ArrayList in the example). This means that all of the data being returned by MongoDB is placed in memory. This approach makes sense when we know the resulting data will fit. It allows the MongoDB driver to get the data, place it in the collection and close up the query. Since we are just getting country codes we don’t anticipate the result being too large to temporarily house in memory.

Hammer versus Aggregation

The last examples in the BasicStatistics class are focused on the concept of aggregating or summarizing data. We often want to summarize information and certainly the popularity of various Map-Reduce approaches highlights this. In terms of leveraging MongoDB, we have a powerful aggregation framework available to us. In order to give a little insight into the use of the framework I’ve included two pairs of methods.

Each pair of methods is focused on aggregating the example data in a specific way. The two solutions within the pair approach the problem differently. One will query the raw data from MongoDB and perform the aggregation using Java code. I call this the hammer approach. The other method will show how the same data aggregation may be achieved by leveraging MongoDB’s aggregation framework being called by Java.

My point isn‘t that you can’t do this type of thing in Java, rather it often makes sense to push responsibility for specific operations to an application tier that is more adept at handling a specific operation. In this case I’d argue that the Java code is better for user interactions and integration and MongoDB’s aggregation framework is better for the summarization operations.

Note that in the JavaDoc for the methods that are using the aggregation framework, the query as you would execute it in the Mongo shell is provided. This is to allow you to see how the query translates into the query document definition in the code.

Country Breakdown

In the first aggregation example, let’s figure out how many attacks were from computers in each country.

countryBreakdownCoded() Method

To perform the aggregation in Java code we pull the raw data from MongoDB. This method shows another way we can work with MongoDB results. Since there could be a significant amount of data returned we don’t want to attempt to load it all into memory. Instead we execute the query using the find() method without the into() method. This returns a FindIterable instance to us. Note again that since we are working with documents, the generic type for FindIterable is <Document>. We can then iterate (using Java’s enhanced for loop in this case) through the results.

Our method then uses a Map to manage the countries and counts.

countryBreakdownAggregation() Method

This method uses MongoDB’s aggregation framework to do the work of summarizing the data. Looking at the setup for the query, we create a Document containing the query as usual, but the documents are using the aggregation framework’s syntax. We’ll get into detail with the aggregation framework in the future. Basically we define a pipeline of steps that we want MongoDB to follow and then we retrieve the resulting (summarized) data. The defined operations allow us to aggregate and summarize data in different ways.

Once we have defined the pipeline of aggregation steps, we use the aggregate() method (defined on the MongoCollection class) to run the defined pipeline. This returns an AggregateIterable, similar to a FindIterable. We can then iterate through the results and report the aggregated data.

In terms of complexity, it is much less Java code and has the advantage of being easier to control through configuration, since we could design this to allow modification of the aggregation pipeline operation without impacting the execution and retrieval Java code.

Honeypot Breakdown

The other pair of methods perform a similar task by honeypot. Take a look at those to see how using MongoDB’s aggregation framework simplifies the Java code.

Wrap-up

My goal with JavaMongo is to reduce the learning curve for Java developers who want to leverage MongoDB. To that end I welcome suggestions and collaboration on the project. See my contact page or participate on the GitHub project directly.

Semantic Teaser

I mentioned earlier that MongoDB is also useful when working with semantic technology and data federation. As a trivial example of this, the sample data set includes a small set of triples for use with Allegrograph (an RDF graph store). Allegrograph can be used to execute queries against bother the graph store and MongoDB’s document store. This is described in the associated README file located with the RDF data. I’ll delve into this separately.

Tags: , , , , , ,

Leave a Reply

You must be logged in to post a comment.