Four Enterprise Arguments Against NoSQL
Dave Callaghan, Modern Data Innovator at Sparks Ignite
I avoid using the term NoSQL because it’s an overloaded term that generates unnecessary negative emotion. Technically, the acronym is for Not Only SQL, which is certainly more inclusive and helpful. NoSQL was just a Twitter hashtag that Eric Evans (of Domain Driven Design fame) suggested for a San Francisco meetup in June 2009 to talk about Big Table and Dynamo. It was never meant to be an attack on relational databases. But just because it wasn’t meant as an attack doesn’t mean that there was no reason to have a discussion. Dr. Codd’s A Relational Model of Data for Large Shared Data Banks was published in 1970. The success of relational databases have shown how powerful using n-ary relations, a normal form for database relations, and the concept of a universal data sublanguage can be when compared to the tree structures used by the hierarchical structure IMS implemented. Reasonably enough, relational databases did not replace every existing mainframe application, nor was it a guarantee against any future innovation in the space. Because that’s not how science works. Dr. Codd probably agrees with me since he co-authored Providing OLAP (On-line Analytical Processing) to User-analysts.
So let’s agree that NoSQL is not some revolutionary cry to dismantle all relational databases and understand there may be room to discuss technical opportunities where Not Only SQL could potentially add business value. Specifically, there are four basic enterprise arguments against NoSQL I would like to address:
- No ACID equals No Go
- SQL is mandatory
- NoSQL means NoStandards
- NoSQL is for Startups
No ACID equals No Go
Critique: Mission critical data must be Atomic, Consistent, Isolated and Durable
Response: Of course it should.
Customer billing information needs Consistency and Availability (CA) and should therefore be stored in an RDBMS. Customer behavior patterns from your web site and call center logs; however, need not be treated in the same manner. No one in the NoSQL community is arguing for a replacement of RDBMS, just additional options.
SQL is mandatory
Critique: SQL is a common language among business users and developers.
Response: This is a great point and that’s why so much effort has been put into providing a SQL front-end to NoSQL solutions. Consider applications such as Apache Hive and Cloudera Impala for Hadoop, DataStax CQL for Cassandra, Pentaho’s MongoDB driver and Apache Phoenix for HBase. Unfortunately, NoSQL is a name that has stuck and drew a line in the sand that is actually not there.
Keep in mind that while SQL is extremely important, perhaps even table-stakes, you are limiting yourself unnecessarily if you conflate analytics with a structured query posed by a SME. Machine learning, predictive analytics and a host of other fantastic tools can be added to your toolbox when you look beyond SQL. Remember, you’ve likely already made this leap once before with MDX. Your data scientists are likely far more familiar with R, SAS and Weka than SQL.
NoSQL means NoStandards
Critique: Large enterprises may have thousands of databases. These need accepted standards.
Response: I will address the weakness that is being addressed in this space while cautioning this critique typically falls apart on its own. Standards and practices is a more nuanced discussion now that we are dealing with unstructured, semistructured, sensor, mobile, social and communication data.
Some of us remember something similar being leveled against the upstart relational model by the mainframe people. Ask yourself if your enterprise has a consistent set of standards that apply to all databases across the enterprise. Just for fun, try to find out how many different ways customer addresses are stored. Do any of your call center records have PII stored in the memo field? Is you data warehouse a Single Source of Truth plus some other stuff? Are there department de-facto systems of record that fall outside of the Data Stewards’ Center of Excellence.
Also, how do you define ‘database’? Most likely not as ‘some entity that contains valuable business data’. That would then include emails, spreadsheets, documents, call logs, images, log files (internet and device), and even external sources such as social media.
I am aware that saying, “you, too!” is not actually an argument. What I would like to point out is that databases do not spring up fully formed and compliant. They are the result of architectural design and implementation discipline. If you apply the same discipline to developing your NoSQL solutions, then you can have standards. For Hadoop, I would strongly recommend implementing Apache Falcoln to capture metadata on the data entering your data lake. The data lake that has, of course, been secured by Apache Ranger to manage authorization, audit and data protection in Hadoop. Keep an eye on Hortonworks’ Data Governance Initiative and the Apache Atlas project for the future of data governance in Hadoop. These tools are nowhere near as mature as their much older RDBMS counterparts which is obviously an architectural consideration. The good news is that the security and governance space is well defined because of all the work that has been done over the years.
NoSQL is for Startups
Critique: Startups can use NoSQL because they are too new to have data structures. Established companies have established data structures.
Response: This is true to a point. When you are a startup, most; but not all, of your use cases are edge cases. A startup’s billing system needs are probably similar to that of an established company, but we’ll leave that aside. The fact of the matter is that the business landscape has changed and not all of these changes can be managed by the traditional corporate OLTP system. The explosion not only of the internet but to an even greater degree mobile devices has presented opportunities that few industries can ignore.
At scale and with new data sources, it has become apparent that one size does not fit all when it comes to data management. Again. If you find yourself implementing antipatterns in your data to try and make your existing enterprise systems work with new data sources (ex sharding, denormalizing, dropping indexes, etc), you may want to consider whether or not these are areas that could benefit from new thinking. Don’t let an old hashtag keep you from leveraging new opportunities.