// JSON-LD for Wordpress Home, Articles and Author Pages. Written by Pete Wailes and Richard Baxter. // See: http://builtvisible.com/implementing-json-ld-wordpress/

Archive for the ‘NoSQL’ Category

MongoDB and Java – Powerful Complementary Platforms

Tuesday, May 31st, 2016

I have found that including MongoDB in the design of Java applications allows me a valuable level of flexibility in meeting client objectives. I have created an initial open source project on GitHub, JavaMongo, with the goal of providing working examples of Java and MongoDB integration. A secondary goal is to include development best practices, such as using testing frameworks and good coding style.

This posting is intended to give a little background on why I find Java and MongoDB to be useful tools in my software development arsenal and then to introduce the JavaMongo project. Future postings will include some videos walking developers through the examples as well as the frameworks being used (like JUnit, Cobertura and Checkstyle)

Background

Java is an ubiquitous platform for creating business applications. It has proven itself across a wide range of use cases from small point-based solutions to large generalized solution stacks. The variety of libraries, frameworks and tools for designing, building, testing and managing Java applications provides significant benefits to companies building solutions using Java. However, an application without ready access to data isn’t particularly useful. As enterprise-scale database options have broadened to include NoSQL, those individuals creating Java-based solutions must be sure to take advantage of new data options in order to benefit from the strengths of such components.

MongoDB is a great NoSQL platform that can be used to provide additional capabilities to your applications. MongoDB is a document store that has proven its reliability, scalability and integrate-ability across numerous small and large-scale applications. Its value and focus complements the way we use relational databases for online transaction-oriented processing (OLTP) and offers advantages over the way we use relational databases for data marts and warehouses.

A point of clarification before proceeding: I’m not here to say that MongoDB is better than some other data product, or, more generally, that document stores are better than relational databases. I find such arguments meaningless without a specific use case or project goal. These technologies are different and have individual strengths and weaknesses in the face of a specific set of project objectives.

I have found that MongoDB plugs in well when I need a place to federate data (structured, semi-structured and unstructured). Given a common platform, it simplifies the work required to build and alter connections between attributes. If you’ve looked at other information about my background you’ll see that I find the use of semantic technology to be incredibly valuable for data federation and classification. MongoDB as a flexible repository plays well with semantics. At the end of this post I’ll give you a small example of that.

JavaMongo Project

The JavaMongo project is intended to provide Java developers with working examples of Java and MongoDB integrations. Over time I expect a variety of common situations to be demonstrated, with associated documentation explaining the use case and the resulting implementation.

In order to have some interesting data to work with, I’m using data sets that my company releases to the public domain. In order to work with the JavaMongo examples you’ll need to import that data into your MongoDB instance. For more information about downloading and importing the sample data, see the discussion on MongoDB Collection of Honeypot Data on my NoSQL topic page.

The initial JavaMongo project contains a basic README file with information on running the example code. Instead of rehashing that information in this post, I’d like to walk through the basic operations being demonstrated in the example code. The main class we’ll explore is BasicStatistics (us.daveread.education.mongo.honeypot.BasicStatistics).

As you know, a Java program starts execution with the main() method. We see that the first step that the BasicStatistics’ main() method takes is to create an instance of the BasicStatictics class.

BasicStatistics Constructor

The constructor code goes through the entire process of connecting to a MongoDB database, accessing a collection and running a query on data in the collection.

First, an instance of MongoClientOptions is created. This class allows us to configure certain client side options related to the connection. I’ll get into more detail with this in future examples. In this case, the program is simply setting the connection timeout to 2000 milliseconds (2 seconds) so that if the instance is not available the program won’t hang for a long time. You wouldn’t make the timeout this short in a production environment but it helps for debugging our local environment by failing fast if something is wrong.

(more…)

Accountable Care Organizations, Data Federation and CMS’ Updated Final Rule for the Medicare Shared Savings Program

Monday, June 8th, 2015

CMS LogoCMS has published a final rule (http://federalregister.gov/a/2015-14005) focused on changes to the Medicare Shared Savings Program (MSSP) which impacts Accountable Care Organizations (ACO) significantly. There are a variety of interesting changes being made to the program. For this discussion I’m looking at CMS’ continual drive toward data use and integration as a basis for improving quality of care, gaining efficiency and cutting costs in health care. One way this drive is manifested in the new rule regards an ACO’s plans as related to “enabling technologies,” which is an umbrella term for leveraging electronic data.

As background, Subpart B (425.100 to 425.114) of the MSSP describes ACO eligibility requirements. Two of the changes in this section clearly underscore the importance of electronic data and data integration to the fundamental operation of an ACO. Specifically, looking at page 127, the following updates are being made to section 425.112(b)(4) (emphasis mine):

Therefore, we proposed to add a new requirement to the eligibility requirements under § 425.112(b)(4)(ii)(C) which would require an ACO to describe in its application how it will encourage and promote the use of enabling technologies for improving care coordination for beneficiaries. Such enabling technologies and services may include electronic health records and other health IT tools (such as population health management and data aggregation and analytic tools), telehealth services (including remote patient monitoring), health information exchange services, or other electronic tools to engage patients in their care.

It goes on to add:

Finally, we proposed to add a provision under § 425.112(b)(4)(ii)(E) to require that an ACO define and submit major milestones or performance targets it will use in each performance year to assess the progress of its ACO participants in implementing the elements required under § 425.112(b)(4). For instance, providers would be required to submit milestones and targets such as: projected dates for implementation of an electronic quality reporting infrastructure for participants;

It is clear from the first change that an ACO must have a documented plan in place for continually expanding its use of electronic data and providing data visibility and integration between itself and its beneficiaries and providers. This is a tall order. The number of different systems and data formats along with myriad reporting and analytic platforms makes a traditional integration approach tedious at best and a significant business risk at worst.

The second change, keeping CMS apprised of the progress of data-centric projects, is clearly intended to keep the attention on these data publishing and integration projects. It won’t be enough to have a well-articulated plan, the ACO must be able to demonstrate progress on a regular basis.

(more…)

Impetus for Our Semantics and NoSQL Workshop at the 2015 SmartData Conference

Friday, May 15th, 2015

I'm Speaking at the 2015 SmartData ConferenceI’m looking forward to being one of the presenters for infuzIT’s hands-on data integration and analysis workshop at this year’s SmartData Conference in San Jose. Giving people the opportunity to see the amazing power of semantics combined with NoSQL to quickly integrate and analyze data makes my day.

My background includes significant work with data, both as an application developer and data warehouse architect. The acceleration of data-centric hardware and software capabilities over the past 10 years now supports a very different paradigm for exploring, reporting and analyzing data. Processes and procedures for creating a data warehouse or mart, the accepted rules of the road for creating integrated data repositories, are no longer clear cut. The data federation debate is no longer Inmon or Kimball.

A significant shift in data integration revolves around the required lifespan of the integrated data. This lifespan has two key aspects whose evolution now allows us to rethink our approach to data federation. This permits us to be much more agile when bringing heterogeneous data sources together. The two aspects are reflected in these design questions: 1) what data, if any, will be rehosted; and 2) what relationships will be supported within the integrated data?

Rehosting Data

In a traditional data warehouse the data must be rehosted. The new repository is the target where transformed data (cleaned-up, standardized) exists. The queries that will be retrieving data from multiple sources are really pulling data from a single source that has been populated from multiple sources. It represents a heavyweight process, driven by Extract-Transform-Load (ETL) scripts and requiring space to host redundant information.

Relationships Between Data Elements

The target warehouse schema determines what relationships are defined between the data elements being combined. Getting this “right” requires careful planning and coordination between the various groups that will use the warehouse. Given the significant effort, represented as cost, organizations tend to design data warehouses to support broad constituencies as a way to amortize the investment across departments and projects.

Paradigm Shift

Semantics and NoSQL allow us to reduce the effort of integrating data by orders of magnitude. They support a completely different mindset for bringing data together. Instead of carefully designing a model that works well in the general sense (reducing the value in specific cases) we have environments that allow us to experiment, adjust and focus on each case.

Below are several drivers which allow us to approach data federation differently using semantics and NoSQL.

(more…)