Share, , Google Plus, Pinterest,

Print

Posted in:

In-Memory Big Data: The New Imperative for Real-Time Decisions

An enterprise’s vitality is predicated on its ability to ingest, manage, synthesize, derive insights and finally make decisions from vast amounts of data at an ever-increasing pace. ‘Data-driven enterprise’ is no longer a marketing slogan; it is a corporate imperative, which, if ignored, will lead to the decline of even the most successful companies.

During this era of digital transformation, the data traversing the enterprise is its most valuable asset. Proliferation of Internet of Things (IoT), the viral nature of social media, and user mobility have exploded the amount and speed of data that needs to be handled on a regular basis. The analytical tools available today have made it possible to extricate information that can impact the business in real-time. Furthermore, as cloud becomes a necessary part of every company’s operating model, their entire data flows are getting exponentially more complex.

In line with the fast growing volume and velocity of data touching the enterprise, the data domains of processing, integration, management and analytics are evolving at a frantic pace. This is not just a technological evolution, but one that must be married with business and economic models to be successful and worthy of mass adoption.

The industry is trying to capture the essence of today’s data landscape with terms like ‘Big Data’ and ‘Fast Data,’ which have little value in describing real execution challenges and solutions. ‘Hybrid’ is a more useful word for qualifying the current situation. We have data that lives on-premises and in the cloud; it can be in public or private clouds; it can be structured or unstructured; it is either at rest or in motion; its values can be cold or hot; its processing is categorized as OLTP or OLAP; analytics need to be run on historical data as well as on live streaming data; and it is kept both on disk and in-memory. While batch processing of disk-based data has been done for a very long time, real-time data processing is the area that requires important technological advancements.

Growing demand for real-time decision making

New use cases for timely, data-driven decisions are emerging across all industries. Insurance companies are starting to offer ‘micro-insurance’ packages for travel based on a consumer’s trip duration and locations. E-commerce companies are customizing prices and promotions for goods and services based on customer profiles, purchase history, time of day and even weather conditions. The manufacturing sector is taking preventive measures for equipment maintenance based on metrics that indicate the efficiency deterioration of tooling. These examples showcase the power of real-time insights and decisions: they can help businesses establish new revenue streams, provide enhanced user experiences for better customer engagement, reduce operational costs and mitigate risk.

Here’s how some forward-looking companies are dealing with massive amounts of data, enabling immediate responses that are based on user interactions:

  • Twitch (a social platform for gamers): This service delivers personalized advertising based on chat history, which can be a powerful revenue driver. Twitch serves 100 million community members, with 2+ million concurrent visitors watching and talking about video games from 1.7 million + broadcasters, and chats that often scale up to 400,000+ users in a single room.
  • Twitter’s Periscope (a live video streaming platform): Periscope lets millions of users auto-play video broadcasts in their Twitter profiles, timelines and individual tweets. Customized delivery of these video streams is a core element of the service, and requires almost instantaneous data capture, insight generation and decision response.
  • Grindr (a mobility-centric website): Delivering up-to-the minute, location-based contextual updates is a critical part of Grindr’s competitive advantage, yet managing that at scale is a significant challenge. The service has 2 million daily users with over one million concurrent users, 900 million API calls, 85 million chats, 300,000 profile picture uploads and 10,000 geospatial lookups.
  • HotelTonight (a mobile app for last minute hotel bookings): This service’s users can explore vacancies and rates, and check into a hotel in a matter of seconds or minutes. Delivering this unique user experience requires real-time location-based processing and synchronization of back end systems and information.

In order to deliver capabilities like these, businesses need the ability to incorporate both immediate dynamics, along with historical analysis of the information. This combination provides context and guidance for making real-time decisions. In-memory computing supports real-time processing of high velocity data and also speeds up the processing of the data at rest. Technologies such as in-memory databases, event streaming platforms, in-memory analytics and high-performance messaging infrastructures are seeing enormous growth that corresponds with the business need for insights from a deeper and wider assessment of data.

Complementary database and analytics technologies

New database and analytics frameworks are emerging from both academia and commercial vendors in response to the vast variety of data challenges faced across industries. Enterprises are utilizing Hadoop to perform deep data analytics, digesting massive volumes of historical data from a multitude of sources to discover patterns and insights and to develop predictive models. This analytics process can be sped up significantly when Hadoop is paired with an in-memory database like Redis. Recent performance testing with the Redis Labs Enterprise Cluster (RLEC) used as cache in conjunction with HBase demonstrated the results shown in Figure 1 below.

Screenshot 2016-07-02 at 9.46.30 AM

Figure 1: Performance with varying levels of HBase workloads running with RLEC

These tests show that overall data analytics performance was accelerated by over 500% when 75% of the data was cached in RLEC. Other data allocations yielded varying results.

While Hadoop is good for handling large data volumes, it does not have the capacity to ingest data at high velocity for real-time analytics. Apache Spark is fast becoming the preferred engine to handle fast data processing needs. Here again, pairing Redis with Spark delivers dramatically improved performance. In use cases such as IoT sensor data, massive volumes of time-series data needs to be ingested and analyzed. In such cases, Redis can be used as a serving layer or an internal accelerator for Spark processing.

Screenshot 2016-07-02 at 9.39.23 AM

Figure 2: Spark query execution time for time-series data with Redis, native Spark, Tachyon and HDFS

Figure 2 above shows results for running Spark queries on time-series data, in which Redis was 100 times faster than HDFS and over 45 times faster than Spark and Tachyon.

Making the economics work

Figure 3 shows the cost of various technologies for 1GB of memory, along with the associated read/write performance. While 1GB of RAM costs $9, SATA SSDs cost $0.40 and Non-volatile PCIe compatible memory cards cost $1 for the same capacity.

Screenshot 2016-07-02 at 9.40.10 AM

Figure 3: Cost for 1GB and Read/Write Latency in microseconds by memory technologies

The selection of a particular memory technology depends on its performance reaching acceptable range for a particular use case. As memory innovation continues, new alternatives to RAM are reducing performance gaps. And database technologies are adapting in tandem, which introduces the ability to combine the new and the old in order to deliver revolutionary performance/cost profiles. For example, with Redis on Flash, an operator can configure their RAM:Flash ratio for the best cost-performance tradeoff within an intuitive user interface.

Screenshot 2016-07-02 at 9.40.45 AM

Figure 4: User interface for managing RAM:Flash data allocation

Recent benchmark (see Figure 5 below) testing with RLEC on Samsung’s next generation Flash NVMe cards shows a performance of 2 million ops/sec at sub-millisecond latency. This test used a single Dell Xeon server, with 80% of the data in Flash and 20% in RAM. This type of memory performance could lower overall costs by up to 70%.

Screenshot 2016-07-02 at 9.41.19 AM

Figure 5: Throughput and latency results for RLEC on Samsung’s NVMe cards

Big Data management in-memory is fundamental to achieving today’s business imperatives

The need for timely insights drives a company’s analytics strategy, which in turn defines the frequency required for data access, and should also correlate to its data storage and database strategy. Selection of database technology ought to be based on the needs of the use case – driving consideration parameters such as performance, overall cost and simplicity.

An in-memory database platform like RLEC, in combination with Big Data technologies such as Spark and Hadoop, is ideal for the technical and economic requirements of most real-time use cases. The latest developments around Redis-on-Flash make analysis of large datasets in-memory simultaneously cost effective and speedy. Short response times and instantaneous decision making enable new business models, make companies competitive, help reduce operational expenses and in many cases revitalize the enterprise.

5 Comments

Leave a Reply
  1. With Spark’s latest release of version 2.0 and the Project Tungsten which started to focus on the optimisation of memory, Spark has been very successful in effectively handling in memory processing. Also, Spark’s ability to handle near rel time processing with the help of Spark Streaming makes it one of the first choices to people interested in carrying out real-time analytics on a very large set of data. As suggested, Redis can further complement the in memory management and speed up the processing to a larger extent

  2. Perhaps the number one imperative in the business world today is to capture all data that is relevant to the organization, from all available sources, and put it to work to support business objectives.Analytics is the science of examining raw data in order to discover meaningful patterns in the data, and draw conclusions from it. Analytics also describes the software and methods used to understand data. Organizations generate and collect data to gain insights into the behavior of customers and competitors, and into their own financial and operational performance; and then leverage those insights to make more accurate predictions and smarter decisions. Hadoop is undoubtedly the preferred choice for such a requirement due to its key characteristics of being reliable, flexible, economical, and a scalable solution.With the advancements of the different data analysis technologies to analyze the big data, there are many different school of thoughts about which Hadoop data analysis
    technology should be used when and which could be efficient.
    A well-executed big data analysis provides the possibility to uncover hidden markets, discover unfulfilled customer demands and cost reduction opportunities and drive game-changing, significant improvements in everything from telecommunication efficiency and surgical or medical treatments, to social media campaigns and related digital marketing promotions.

  3. In-Memory Big Data is definitely the future in analytics.It will definitely make a huge noise in analytics with its processing speed and efficiency under heavy workload.Since there has been many developments in the semi-conductor devices which can hold large amounts of data as well as are cost-effective it can actually help to implement this new idea with more feasibility.
    The implementation of this concept is going to help many companies which are requiring the real-time data processing instantly.So in order to achieve that,let us construct a computer which is capable of holding huge amounts of data and has implemented the hadoop software framework which is further connected to the redis server.This computer can be further developed into a super-computer which is specifically meant for the processing of large chunks of data and can also be implemented as a server for other remote computers to use and deploy the functionalities.With this the R&D of a company can prosper as it performs faster calculations and will not require too much of capital to implement as the semi-conductor devices for processing and memory storage which is needed to construct, is falling. This will be a one time investment for a company as this concept has the potential to go very far in the upcoming future and won’t be obsolete very soon.The computer can also be updated with the new technological developments taking place which can improve processing speeds and memory management.

  4. Nice article. With increasing data, a point can also be noted that the technologies to hold data(semiconductor technologies) have also been developing at quite a fast pace and the cost of production of these semiconductor chips is also declining. Holding clusters is also getting cheaper. Hence flash memory computations(which are faster than disk computations) are possible today and spark proves how fast they can get. Spark has couple of interesting records under its belt also, like fastest computation on 100TB and 1PB of data at much lesser cost. It had completed a sorting of 1TB of data in just 23 minutes on public cloud!! It is also easier to use and provides access to Machine Learning libraries for fast iterative processes, and can be generalized and combined to provide a unified solution for many data processing needs. Also its interesting to note that a new version of spark is released almost every 3 months and it keeps itself on par with the newest trends. And one last thing, it is completely open source and free. So “In-Memory Big Data” is definitely the future.

    • Yes, agreed. Further, combination of Spark and Redis delivers an even higher performance for large analytics workloads.

Leave a Reply

Your email address will not be published. Required fields are marked *