Share, , Google Plus, Pinterest,

Print

Posted in:

The Forces Shaping NoSQL – Past And Present

Database conversations had gotten pretty boring – until the conversation began changing to center around the data itself. Prior to this relatively recent development, database comparisons or talk of disruptions were relegated to esoteric IT circles. Why? Because over the years, relational databases had become de-facto. Oracle, Microsoft, IBM and open source products looked more or less the same, with fine points spent on availability, licensing, environment support and haggling over minor features. But they all had their foundations built on the rigid relational schema.

As our systems evolved, the increasing volume, variety and velocity of data forced the emergence of open source approaches like Hadoop and NoSQL. These new approaches dramatically changed the conversation around databases – and data. Data became interesting and mysterious, yielding insights only to the seekers that could harness it. And databases were no longer predictably relational.

This fundamental shift came from the realization that data is diverse – and it should be allowed to remain that way. That it is no longer just about capturing the transaction and squeezing it into uniform rows and columns; it’s also about capturing the events leading up to the transaction. In this new world of high-volume, high-variety, and high-velocity data, relational databases and SQL queries are too slow to accommodate a significant portion of the data contributing to the big data deluge. And so the database world changed.

This is where NoSQL comes in

 

NoSQL databases, known for out-of-the-box thinking, encompass a wide variety of processing scenarios: graph, columnar, document, key-value and many more. They tackle head on the need for different approaches to simplify and accelerate specific tasks like serving ads, enabling social interactions, detecting fraud or delivering fine grained personalized experiences. These are big mountains to climb and they can be extremely difficult to scale – if not impossible – in the relational paradigm. In addition to the never-ending discovery of new processing scenarios, two additional forces continue to fundamentally impact NoSQL and the big data conversation:

  • Hardware evolution. The continued movement toward increasing network capacity and cheaper computing has led to the rise of distributed network-connected systems as the most cost-effective choices for a large variety of computing scenarios. Similar movements in memory innovation have led to the rise of RAM and SSD as the most favorable choices for real-time processing.
  • Data lifecycle evolution. While the fundamentals of the data lifecycle have stayed the same, each lifecycle stage must now support increasing types of interactions – at increasing speeds. What was acceptable as real time has gone from seconds to sub-milliseconds in the last few years.

Hardware Evolution

 

The mainframe era used expensive shared compute, and was both proprietary and cost prohibitive. As servers evolved, the client-server model brought scale up. Scale up within this model was cheaper to start with, but got expensive as compute needs grew. In the end, the scale up ceiling simply was not high enough for big data. In recent times, clusters of unreliable commodity hardware have emerged as both cheap and highly available because networks have evolved to be faster and better sources of redundancy – in fact, data protection through replication is a better safeguard against hardware faults. And an even greater revolution with Flash memory has allowed memory and disk to converge.

Data Lifecycle Evolution

 

Data lifecycles, while still fundamentally the same, have grown considerably more intense. We now reuse the data many more times between birth to archival because new algorithms allow machines to learn and process data for building smarter computing. Increasingly, NoSQL is powering the many stages of today’s bustling data lifecycle. Stage One: Birth/Real-Time Processing

Data is born when a user directly interacts with an application, when changes are perceived by devices, or when events are generated by systems. The intensity of these activities – their volume and velocity – often dictate the type of system needed to handle data ingest. Responsive interactive applications, shorter latencies and higher throughput is the new competitive advantage. The way to increase engagement is through faster and more precise personalized experiences. Sub-millisecond latencies and in-memory processing are needed to capture and serve data in this context at the speed users have come to expect.

Stage Two: Batch/Analytics Processing

Batch processing typically applies to data that is used to generate reporting aggregates, such as sales by region or sales by day. Relational models worked well during this lifecycle stage for smaller data sets that remained meaningful hours or days after events for measuring performance. However, if real-time reporting is required, instead of moving data, you must query the operational stores directly to find out how an advertising campaign, for example, is impacting sales this very second. Batch analytics are also increasingly used for machine learning and other artificial intelligence techniques that generate models which implement business rules in real time. In some cases, learning from past transactions is a good indicator of the future, but for many real-time events, what happened moments ago is more relevant to personalizing the experience. To accommodate insights from both the past and the “now,” machine learning is evolving to occur in real time so it can overlay valuable data from across the entire timeline into the predictions generated.

Stage Three: Storage/Archival

When it comes to storage and archival, data processing needs are not typically driven by high-speed or high-volume requirements, as is increasingly the case with other data lifecycle stages. Archived records could be used to settle disputes or meet legal requirements, and are therefore infrequently accessed. Data consistency and accuracy are paramount in this stage and one should use the models that guarantee these basic needs.

So what does the future hold for databases?

 

Having observed the database market as it’s evolved from hierarchical to relational and NoSQL—and having directly worked on many of these database engines over my career such as Illustra, Informix, Microsoft SQL Azure, Couchbase and, currently, Redis at Redis Labs – there is no doubt for me that the future for databases is all about the real-time data access space. Take a look at the applications you use on a daily basis. Would you use them if they were even a couple of seconds slower to react to your requests? As NoSQL evolves, its challenge lies in accommodating the various new types of data and processing needs continually presenting themselves, without becoming as slow as relational databases. The market belongs to the databases that can rapidly adapt to new data processing scenarios, while still delivering with uncompromising performance and simplicity.