What We Mean By NoSQL – Part 1: A Historical Foundation

For those in the database industry, I believe the term “NoSQL” is unhelpful at best, and downright confusing at worst [1]; but with so much other information to learn in this new era, spending time trying to campaign for a name change seems to be a misplaced priority.  Therefore, given its place in our modern database lexicon, perhaps the best we can do is try to classify it a little better so that people can at least have clearer conversations in this murky water.  And for that, we need a brief historical journey to get us on solid footing for future discussions.

I am certainly not the first to take this approach of walking through the past to the present.  A simple search for “NoSQL history” will give you enough reading for years to come.  That fact might be part of the point I am trying to make about understanding this market.  We are in a paradigm shift — a huge one.  As such, there will be myriad perspectives on its impact, which is what we would expect from a disruption that is large and far reaching.  Any material change to a well-established status quo at the macro level will produce a tremendous diversification of micro stories.  What I will attempt to do here, with my micro story, is select the bits of history that I think most relevant to someone like me who comes from a deep, traditional relational database management system (RDBMS) background.  And for people with that history, a great place to begin is remembering why and how we stored data the way we did.

For the most part, we have been storing data the same way for roughly 35 years.  When you ask why to database historians (a nice title for old geeks like me?), the reply would be something along the lines of: “Because Codd told us to!”  Codd is a reference to Edgar F. Codd, the IBM employee who invented the relational model in 1970.  IBM was slow to adopt his designs, and whatever other reasons they may have had for their delay, the fact that they were in the process of building a juggernaut business built around a different kind of database called IMS surely played no small part in the decision.  NASA was using IMS to help get people to the moon!  Who needed anything more?

Over time, and with a lot of controversy and debate in the industry, Codd’s relational model became the dominant database design.  By the time I began my development career in 1992, the 25-year run of IMS was coming to a close and relational models were what every new developer wanted to use and learn.  That learning came with some core concepts that are still (surprisingly, to some) relevant to understanding NoSQL today.

For our purposes, I will focus on what I believe to be the four of the most relevant terms for understanding NoSQL [2]:

Term 1: Normalization.  Eliminating redundant data was one of the foundational principles for storing data in an era when storage was limited and expensive.  To normalize data was to eliminate redundancy.  For example, you didn’t store the text “Finance” alongside every employee’s name who worked in the Finance department.  You stored “Finance” one time only, and found other ways to reference it that did not use nearly as much space as repeating the text string every time.

Term 2: Third Normal Form (3NF).  Codd took the process of normalization and not only eliminated redundant data, but also brilliantly introduced rules that would prevent data anomalies.  For example, in a valid 3NF database, you would never end up with a department name spelled “Finance” in one place and “Finence” in another.  When data was stored in 3NF, it was said to be not only truly “normalized” but also came with other benefits as well.  For example, I could guarantee that every single employee I entered into my database was associated with a valid department.  The database rules would enforce that for me, if I designed it according to 3NF.  It is hard to overstate how amazingly cool and powerful this was to new developers like myself.

Term 3:  Schema.  Once the rules for normalizing data using 3NF were established, the database needed a way to understand how to create the structures that would hold the data.  That is the role of the schema, which is a formal language that describes these structures to the database.  For relational databases, the schema defines how tables are built and how they relate to one another.  For example, the “Departments” table has a relationship to the “Employee” table, and the schema tells the database the exact nature of that relationship and how to build it inside the database management system.

Term 4: Data Model.  Schema languages are quite technical and, like any language, are confusing to those who do not know the language.  Fortunately, there was a better way to communicate the nature of a schema to non-technical people.  The data model used conventions that made it easy to “see” (both visually, using boxes and lines, and conceptually) the nature of the data relationships.  It was an invaluable tool for us developers to communicate with our business owners (users of the system) during the design phase of a project.

For 30+ years the techniques defined by those four terms revolutionized how we interacted with our data.  They influenced everything about the way we built our applications, which made the relational database the proverbial “king of the jungle” in the datacenter.  What could be more important than the data itself?  Nothing.  In fact, everything in the datacenter was built to allow for the flow of, and access to, data.  The name “datacenter” says it all.

With the relational database firmly in the role of primacy, the four terms above became so closely associated with relational databases, that scarcely anyone knew, or cared, that they were applicable to data in general that could be stored in non-relational forms.

Until, that is, the whole world was introduced to global accessibility.

Next in the series – Part 2: The Rise Of The Web

[1] For those that care about the origin of the term “NoSQL”, near as I can tell, it was originated by Carlo Strozzi in 1998 with the release of his “Strozzi NoSQL” RDBMS.  Current (circa 2015) usage of the term does not appear related in any meaningful way to Strozzi’s database.  Some claim Eric Evans repurposed the term for the modern usage, but thought it might be a good idea to go with Emil Eifrem’s idea to call it “Not Only SQL.”  

[2] The correlation of these terms to the NoSQL world will be brought full circle in part 4 of this series.  

 

2 thoughts on “What We Mean By NoSQL – Part 1: A Historical Foundation

  1. Pingback: What We Mean By NoSQL – Part 2: The Rise Of The Web | Billy's Blog

  2. Pingback: What We Mean By NoSQL – Part 4: Putting It All Together | Billy's Blog

Leave a Reply

Your email address will not be published. Required fields are marked *