Post-relational databases are almost certainly here to stay, and as noted by Matt Asay, there is good reason to believe that the open-source variety will likely play a predominate role. Notice, however, that I used the plural term: databases. As in, more than one. Even though I am part of a post-relational database company , I am persuaded that several types of these databases will make it into the mainstream for years to come, and I believe that they will settle into three large, primary categories:
- Distributed Transactional Backbones (e.g. Cassandra)
- Document Databases (e.g. MongoDB)
- Analytical Data Warehouses (e.g. Hadoop)
(Because there are presently several changing players in each category, I only chose one example for each by way of illustration.)
To be sure, there are many other types of post-relational database showering the landscape. I applaud Matt Aslett at the 451 Group for somehow keeping both his sanity and his eyesight after wrestling with the problem of representing the exploding database landscape over the past several years. Here is his powerful graphic, 2014 edition:
I must admit that even though I have been around databases for 23 years, and I currently work for a database company, if I look at that graphic for too long, it makes my head hurt. How do we expect users (for whom the database is only a single piece of the architecture) to find their way through that maze in real life? The answer is: they cannot. At least not for very long because large markets ultimately settle on a few of the best technologies after a disruptive explosion. Therefore, my contention is that consolidation must eventually occur.
To that end, a few months ago the author of that graphic wrote an insightful blog post that sheds light on one way in which this consolidation will occur: adding functionality to the core of the most popular new databases on the market in order to deliver “multi-model” capabilities. Commenting on a recent acquisition by DataStax, and a new release of MongoDB, he writes:
“While each of these announcements could be considered significant in its own right[,] in combination they suggest a new stage in the evolution of NoSQL and a clear signal that the future of NoSQL will be driven by database products that support multiple data models.”
A post from the DataStax blog captures the essence of the problem driving this trend, irrespective of which solution someone ultimately chooses:
“Because [underlying applications require more than one NoSQL data model format], architects oftentimes have to shard an application and use different NoSQL providers to meet the multi-model requirements of the underlying system. This increases complexity, cost, the IT staff’s learning curve and slows the application’s time to market.” [emphasis mine]
The ecosystem simply cannot handle too much technical diversity. It will narrow to a few solutions that have smartly consolidated functionality into coherent platforms. My belief is that the multi-model platforms will arise in the three big categories I listed above: Transactional Backbones, Document Stores, and Analytical Data Warehouses.
Giving advice on which platform to choose in each category would be self serving, so I will leave that to one side for this post and simply direct you to the DataStax website if you wish to explore the rationale behind our solution. Obviously, you should do the same with other companies in order to get a well-rounded picture of what will best meet your needs.
Regardless of your ultimate choice, the biggest thing I wish to convey is that the term “NoSQL” is not a narrow one. It comprises many offerings that solve different types of problems. Choosing which best meets your needs is not a decision you should take lightly, because in so doing you are making an infrastructure decision that will be with you for years to come. Start by defining your problem requirements carefully, then go from there.
During your evaluations and learning, you will find yourself returning to those four key terms that I introduced in the beginning of this series:
- Third Normal Form (3NF)
- Data Model
You may be surprised to learn that they are still relevant, but in ways you may not expect. For example, let’s look at each in light of the database I know best: Cassandra.
Far from irrelevant, the Data Model will make or break your application performance (a surprise to many who think NoSQL = No Data Model). However, the modeling philosophy is so different from your 3NF relational models that relational minds will initially recoil at the techniques.
Trading away Normalization is perfectly reasonable (and will become desirable) in exchange for ridiculously high performance gains.
The design of, and adherence to, an inflexible relational Schema (that was so comfortable and necessary in our past) actually inhibits application flexibility in a way that can bring your development cycles to an unacceptable crawl.
If you come from a relational database background, it will take a little time to adjust to the new techniques and design principles. However, once there, I can assure you that your architects, developers, and sysadmins will experience freedom and power that has not been felt since — well, since the relational model first came on the scene those many decades ago. That is why the changes afoot today so closely parallel the past that I have described in this series. The shift from IMS to relational was anything but simple, and the reasons were as much (if not more) sociological than technical. The same is true today; but just as true will be the ultimate acceptance of these “NoSQL” technologies into the mainstream for years to come.
Explore the possibilities. Enjoy the ride!
Previous in this series – Part 3: Digital Eats The World