Sunday, October 11, 2015

The NoSQL Big Data Face Palm

I have found in practice that the use of NoSQL databases to solve Big Data problems largely exist where there was an architect that (pick one or more):
  • Thought a NoSQL technology could provide reliable, believable and accessible data that everyone in the corporation can rely on
  • Wanted to try a new technology and did not foresee the technical debt that would accrue overtime where strong schema enforcement is not baked into the database solution
  • Was more interested in a quick win with a promising technology than a sustainable, long term architecture that required significant database design and forethought
  • Failed to understand when to use OLTP and when to use OLAP and how data should be modeled and how it should flow from a live system to a reporting/analytics system
  • Did not understand how to use Database Sharding to achieve high performance with distributed large data stores
  • Thought designing for ease of programming at the cost sparseness of data storage was a best practice.
  • Thought, "Disk storage is cheap.", but failed to take IO into account.
  • Thought, "We have so much data, we won't consider backup or disaster recovery".
  • Failed to account for the effects of technology churn of their new technology
  • Does this a lot:  


I am not saying that technologies like Cassandra, Riak, Hadoop, MongoDB, etc., should have no place in any corporate portfolio.

(There are many use cases where the capability of storing unpredictable and/or amorphous data is a necessity, but often times there will be a relational database that contains the metadata to make sense of the noSQL data.)

I am saying that implementations and deployments of those technologies have caused a lot of data integrity issues and should be thoughtfully considered before adoption.


This work is licensed under the Creative Commons Attribution 3.0 Unported License.

No comments:

Post a Comment