Share, , Google Plus, Pinterest,


Posted in:

BDaaS – Where Cloud And Big Data Meet

Big Data as a Service

Traditional databases and existing scalable architectures have not been able to meet the requirements of “large volume, high velocity, and wide-variety data,” which has been defined in business terms as “Big Data”. Businesses investigating Big Data regularly recognize that they lack the capacity to process and store it adequately. This manifests either in an inability to utilize existing Big Data sets to the fullest or expand their current data strategy with additional data.

Organizations are increasing their data by almost 40% annually, which would require them to double their infrastructure capacity every year. Cloud computing offers a promising and a flexible way to address the challenge of growing data sets. Thus businesses are turning to Big Data as a service to bridge the processing and storage gap.

For example, leading organizations such as Netflix and Dropbox use cloud-based Big Data storage services such as Amazon’s simple scalable storage (S3) to store data in object form. Another good example of BDaaS service is the Analytics for Twitter service provided by IBM. This service provides companies having Twitter account to access data on Twitter’s 280 million active users which is petabytes in size. IBM’s service helps businesses to analyze user tweets and utilize them for their business. When one takes BDaaS service from a provider, the provider assumes total responsibility for data protection, analysis and prediction and also stores data on their (provider) servers.

BDaaS stands for two things: highly functional ‘Service-oriented’ data architecture and an instantaneous growth/ promise of cloud virtualization. Major types of BDaaS that are currently in use are Core BDaaS, Performance BDaaS and Feature BDaaS.

Core BDaaS is generic and implements minimal platforms like Hadoop, YARN, HDFS, and Hive. It has been around for some time and is used by companies for irregular workloads.

Performance BDaaS includes basic infrastructure and other software and hardware services to optimize business performance for specific purposes. This service helps to scale up computing potential and perform virtualization at predictable costs. In a nutshell it is a database ecosystem with optimized infrastructure (IaaS and PaaS).

The third path of integration for BDaaS is upwards, including features beyond the common Hadoop ecosystem offerings. The feature driven BDaaS focuses on productivity and abstraction to get users started with Big Data quickly. Their offering includes web and programming interfaces as well as database adapters pushing technologies like Hadoop into the background and their offering reaches into the SaaS layer. In fact, Hadoop clusters are started, scaled and even stopped transparently according to the load in the system. It is a database ecosystem with features for productivity and exchangeable infrastructure.

Similar to the core BDaaS, the feature approach uses IaaS to provide computing and storage, though with a significant difference. The independence from a cloud provider allows a feature BDaaS to view computing and storage as a fully scalable, and more importantly an exchangeable commodity like electricity or water.

Using technologies in BDaaS, organizations can get a service set up using Microsoft Azure or Qubole, which will provide the required infrastructure without necessitating expensive hires with specialized skill-sets. The service provider companies will take care of maintenance and upkeep of such technologies and service consumers can focus on using services to address their business requirements.

Qubole supports Amazon and Google’s IaaS and have customers like Quora, Pinterest, Saavn etc. Qubole’s pricing scheme is pay as you go or prepaid packages for their service which is ideal for variable, unpredictable or exploratory workloads. Others in the same domain are the likes of PlatforaSplice Machine and Cask with customers like Disney, Cisco, Citi etc providing Hadoop as a service with additional features like Hive, Spark, Cascading etc.

The advantages of using BDaaS are cost reduction by 50 to 60 %, flexibility and scalability, user friendly approach, faster data processing and access to a continuous support system for maintenance. This way, users can focus on core business rather than thinking about operational issues. Also, it offers the option to do batch processing or stream processing depending on the requirement. Cost savings are not only from the infrastructure point of view but also there is a shortage of skill set in maintaining this infrastructure. If an organization needs to have a distributed data storage system such as HDFS, then it has to hire infrastructure specialists who have those specialized skills sets. Maintaining the new Big Data infrastructure requires additional expenditure because of the expensive skills that need to be hired.

Thus, be it financial, retail, media, politics, private or the public sector, BDaaS can upgrade business efficiency through data processing software and analytics. 

Many players have entered this domain and are trying to exploit available opportunities. However, players are currently facing competition from on-premise data storage and data processing which use the Hadoop platform as many big organizations still go for it due to security and reliability advantages. Apart from this, they face the potential competition from public cloud vendors offering their own Hadoop products. Rackspace has recently entered this space and big players like AWS, Azure can do so in the future.

Cloud computing is increasingly seen by organizations as a viable way to reduce costs and increase implementation efficiency. Vendors are increasingly launching cloud-enabled versions of their transitional Big Data technologies. This trend indicates that most of the Big Data technologies that we see in the industry today will be cloud-enabled in the upcoming years. Consolidation of cloud service providers indicates that technologies implemented will be increasingly capable, perhaps spanning most of the layers of the BDaaS stack. As such technologies emerge, they will increasingly encompass functions of the higher layers, effectively providing an “analytics package” over the cloud.

To stay in the game, these players have to stay ahead of the technology and this is where integrated BdaaS comes to play. No provider is currently offering this. BlueData is trying to make a name in this. This service will comprise Feature and Performance BDaaS, which could maximize performance while providing full support to business experts.