David S. Read

NoSQL

NoSQL platforms such as MongoDB, Cassandra and Neo4j give us an amazing amount of flexibility when looking for an optimal platform for working with data. Instead of having to squeeze our data into a relational model, we can pick a data representation that fits the dynamics of our data and use cases. This flexibility is a double-edged sword. When relational models were our only option we got really good at optimizing those models. Now we need to have an effective understanding of a variety of paradigms and platforms in order to determine an optimal choice for our business operations or analytic needs.

My primary goal for creating these web pages and examples focused on NoSQL is to help people understand the technology. Being able to try out a tool with real data is often a valuable step in assessing a product's dynamics.

My initial foray will be around MongoDB. I worked through the developer certification for MongoDB and have found it to be a powerful document store in development and production environments.

MongoDB Java Demo Code

A Java Application for Utilizing Data in MongoDB

Leveraging MongoDB's Java Driver library I have begun building out a Java demonstration application intended to show developers the basic steps to interact with a MongoDB server. My goals with the application are to provide a hands-on platform that developers may install, run and modify as a way to learn about MongoDB and its use within a Java program. The code is intended to show good practices, both in terms of using MongoDB but also in terms of Java coding. Therefore tools such as Cobertura, Apache Ant and Checkstyle are also used within the project.

The project is housed on GitHub at: https://github.com/DaveRead/JavaMongo

Note that this demonstration code requires a sample collection of honeypot data to be installed. See the section entitled, MongoDB Collection of Honeypot Data located below.

MongoDB Collection of Honeypot Data

The initial MongoDB examples I'll be releasing will leverage a MongoDB collection that contains honeypot data (described below) collected in early 2016. I chose to use this data since I have a background in data security, there are some interesing patterns that can be found in such data and the domain differs from a lot of sample data I've seen that focuses on financial services or healthcare.

The data is available for download at these links:

When you uncompress the archive you will see a readme.txt file with basic information, a license.txt file with the Affero GPL license information and a honeypot.data.readme.txt file with in depth documentation regarding importing and interpreting the data.

Honeypot Description

For those interested in a little background on this data here are a few paragraphs from the honeypot.data.readme.txt file included in the archived data file.

A honeypot is a server running specialized software applications known as sensors. Each sensor is designed to identify specific types of access attempts to the server. The server has no business-related software installed. There are no links to the server from other systems and no users would have any reason to connect to the server. Therefore, any attempt to access the honeypot server is regarded as an attack, since the access attempt cannot have a legitimate purpose.

The server is placed on a network (inside or outside a corporate firewall) and the various sensors report connection attempts when they occur. Typically a central server is used to collect and aggregate the information from the various honeypot servers and sensors.

The purpose of running honeypots is to collect information related to how attackers attempt to break into servers. Used on an internal network, a honeypot can help to identify infected computer systems or rogue workers.

All of the data collected was obtained using the servers and sensor configurations supplied by the Modern Honey Network (MHN). This set of open projects greatly simplifies the process of setting up and operating an honeypot. More information on the MHN can be found at: https://threatstream.github.io/mhn/

Blog

I will create periodic blog entries describing my exploration of NoSQL concepts. These entries can be found by following the link below.

NoSQL Blog Entries

DigitalOcean Cloud Platform

As an aside, I have been using DigitalOcean's Cloud Platform to setup servers in data centers around the world, running production applications and honeypots. Their pricing structure is very affordable and their dashboard operations make setup, access and teardown of servers quick and easy. I highly recommend them if you need on-the-fly Infrastructure as a Service (IaaS). Note the supplied link is associated with my Id. If you use that link and end up using their services we both benefit with a discount.