The initial MongoDB examples I'll be releasing will leverage a MongoDB collection that contains
honeypot data (described below) collected in early 2016. I chose to use this data since I have a background
in data security, there are some interesing patterns that can be found in such data and the
domain differs from a lot of sample data I've seen that focuses on financial services or healthcare.
The data is available for download at these links:
When you uncompress the archive you will see a
readme.txt file with basic information, a
license.txt file with the Affero GPL license information and a
honeypot.data.readme.txt file with in depth documentation regarding importing and interpreting the data.
Honeypot Description
For those interested in a little background on this data here are a few paragraphs from the honeypot.data.readme.txt file included in the archived data file.
A honeypot is a server running specialized software applications known as sensors. Each sensor
is designed to identify specific types of access attempts to the server. The
server has no business-related software installed. There are no links to the
server from other systems and no users would have any reason to connect to
the server. Therefore, any attempt to access the honeypot server is regarded
as an attack, since the access attempt cannot have a legitimate purpose.
The server is placed on a network (inside or outside a corporate firewall) and
the various sensors report connection attempts when they occur. Typically
a central server is used to collect and aggregate the information from
the various honeypot servers and sensors.
The purpose of running honeypots is to collect information related to how
attackers attempt to break into servers. Used on an internal network, a honeypot
can help to identify infected computer systems or rogue workers.
All of the data collected was obtained using the servers and sensor configurations
supplied by the Modern Honey Network (MHN). This set of open projects greatly
simplifies the process of setting up and operating an honeypot. More information
on the MHN can be found at: https://threatstream.github.io/mhn/