Share, , Google Plus, Pinterest,

Print

Posted in:

In-Memory Big Data: The New Imperative for Real-Time Decisions

An enterprise’s vitality is predicated on its ability to ingest, manage, synthesize, derive insights and finally make decisions from vast amounts of data at an ever-increasing pace. ‘Data-driven enterprise’ is no longer a marketing slogan; it is a corporate imperative, which, if ignored, will lead to the decline of even the most successful companies.

During this era of digital transformation, the data traversing the enterprise is its most valuable asset. Proliferation of Internet of Things (IoT), the viral nature of social media, and user mobility have exploded the amount and speed of data that needs to be handled on a regular basis. The analytical tools available today have made it possible to extricate information that can impact the business in real-time. Furthermore, as cloud becomes a necessary part of every company’s operating model, their entire data flows are getting exponentially more complex.

In line with the fast growing volume and velocity of data touching the enterprise, the data domains of processing, integration, management and analytics are evolving at a frantic pace. This is not just a technological evolution, but one that must be married with business and economic models to be successful and worthy of mass adoption.

The industry is trying to capture the essence of today’s data landscape with terms like ‘Big Data’ and ‘Fast Data,’ which have little value in describing real execution challenges and solutions. ‘Hybrid’ is a more useful word for qualifying the current situation. We have data that lives on-premises and in the cloud; it can be in public or private clouds; it can be structured or unstructured; it is either at rest or in motion; its values can be cold or hot; its processing is categorized as OLTP or OLAP; analytics need to be run on historical data as well as on live streaming data; and it is kept both on disk and in-memory. While batch processing of disk-based data has been done for a very long time, real-time data processing is the area that requires important technological advancements.

Growing demand for real-time decision making

New use cases for timely, data-driven decisions are emerging across all industries. Insurance companies are starting to offer ‘micro-insurance’ packages for travel based on a consumer’s trip duration and locations. E-commerce companies are customizing prices and promotions for goods and services based on customer profiles, purchase history, time of day and even weather conditions. The manufacturing sector is taking preventive measures for equipment maintenance based on metrics that indicate the efficiency deterioration of tooling. These examples showcase the power of real-time insights and decisions: they can help businesses establish new revenue streams, provide enhanced user experiences for better customer engagement, reduce operational costs and mitigate risk.

Here’s how some forward-looking companies are dealing with massive amounts of data, enabling immediate responses that are based on user interactions:

  • Twitch (a social platform for gamers): This service delivers personalized advertising based on chat history, which can be a powerful revenue driver. Twitch serves 100 million community members, with 2+ million concurrent visitors watching and talking about video games from 1.7 million + broadcasters, and chats that often scale up to 400,000+ users in a single room.
  • Twitter’s Periscope (a live video streaming platform): Periscope lets millions of users auto-play video broadcasts in their Twitter profiles, timelines and individual tweets. Customized delivery of these video streams is a core element of the service, and requires almost instantaneous data capture, insight generation and decision response.
  • Grindr (a mobility-centric website): Delivering up-to-the minute, location-based contextual updates is a critical part of Grindr’s competitive advantage, yet managing that at scale is a significant challenge. The service has 2 million daily users with over one million concurrent users, 900 million API calls, 85 million chats, 300,000 profile picture uploads and 10,000 geospatial lookups.
  • HotelTonight (a mobile app for last minute hotel bookings): This service’s users can explore vacancies and rates, and check into a hotel in a matter of seconds or minutes. Delivering this unique user experience requires real-time location-based processing and synchronization of back end systems and information.

In order to deliver capabilities like these, businesses need the ability to incorporate both immediate dynamics, along with historical analysis of the information. This combination provides context and guidance for making real-time decisions. In-memory computing supports real-time processing of high velocity data and also speeds up the processing of the data at rest. Technologies such as in-memory databases, event streaming platforms, in-memory analytics and high-performance messaging infrastructures are seeing enormous growth that corresponds with the business need for insights from a deeper and wider assessment of data.

Complementary database and analytics technologies

New database and analytics frameworks are emerging from both academia and commercial vendors in response to the vast variety of data challenges faced across industries. Enterprises are utilizing Hadoop to perform deep data analytics, digesting massive volumes of historical data from a multitude of sources to discover patterns and insights and to develop predictive models. This analytics process can be sped up significantly when Hadoop is paired with an in-memory database like Redis. Recent performance testing with the Redis Labs Enterprise Cluster (RLEC) used as cache in conjunction with HBase demonstrated the results shown in Figure 1 below.

Screenshot 2016-07-02 at 9.46.30 AM

Figure 1: Performance with varying levels of HBase workloads running with RLEC

These tests show that overall data analytics performance was accelerated by over 500% when 75% of the data was cached in RLEC. Other data allocations yielded varying results.

While Hadoop is good for handling large data volumes, it does not have the capacity to ingest data at high velocity for real-time analytics. Apache Spark is fast becoming the preferred engine to handle fast data processing needs. Here again, pairing Redis with Spark delivers dramatically improved performance. In use cases such as IoT sensor data, massive volumes of time-series data needs to be ingested and analyzed. In such cases, Redis can be used as a serving layer or an internal accelerator for Spark processing.

Screenshot 2016-07-02 at 9.39.23 AM

Figure 2: Spark query execution time for time-series data with Redis, native Spark, Tachyon and HDFS

Figure 2 above shows results for running Spark queries on time-series data, in which Redis was 100 times faster than HDFS and over 45 times faster than Spark and Tachyon.

Making the economics work

Figure 3 shows the cost of various technologies for 1GB of memory, along with the associated read/write performance. While 1GB of RAM costs $9, SATA SSDs cost $0.40 and Non-volatile PCIe compatible memory cards cost $1 for the same capacity.

Screenshot 2016-07-02 at 9.40.10 AM

Figure 3: Cost for 1GB and Read/Write Latency in microseconds by memory technologies

The selection of a particular memory technology depends on its performance reaching acceptable range for a particular use case. As memory innovation continues, new alternatives to RAM are reducing performance gaps. And database technologies are adapting in tandem, which introduces the ability to combine the new and the old in order to deliver revolutionary performance/cost profiles. For example, with Redis on Flash, an operator can configure their RAM:Flash ratio for the best cost-performance tradeoff within an intuitive user interface.

Screenshot 2016-07-02 at 9.40.45 AM

Figure 4: User interface for managing RAM:Flash data allocation

Recent benchmark (see Figure 5 below) testing with RLEC on Samsung’s next generation Flash NVMe cards shows a performance of 2 million ops/sec at sub-millisecond latency. This test used a single Dell Xeon server, with 80% of the data in Flash and 20% in RAM. This type of memory performance could lower overall costs by up to 70%.

Screenshot 2016-07-02 at 9.41.19 AM

Figure 5: Throughput and latency results for RLEC on Samsung’s NVMe cards

Big Data management in-memory is fundamental to achieving today’s business imperatives

The need for timely insights drives a company’s analytics strategy, which in turn defines the frequency required for data access, and should also correlate to its data storage and database strategy. Selection of database technology ought to be based on the needs of the use case – driving consideration parameters such as performance, overall cost and simplicity.

An in-memory database platform like RLEC, in combination with Big Data technologies such as Spark and Hadoop, is ideal for the technical and economic requirements of most real-time use cases. The latest developments around Redis-on-Flash make analysis of large datasets in-memory simultaneously cost effective and speedy. Short response times and instantaneous decision making enable new business models, make companies competitive, help reduce operational expenses and in many cases revitalize the enterprise.