Share, , Google Plus, Pinterest,


Posted in:

The Rapid Intertwining Of Machine Learning And Big Data

Machine learning is rapidly integrating into the fabric of new and existing application infrastructures. Machine learning can identify data patterns rapidly and effectively, and in combination with human intuition, can deliver the kinds of predictive outcomes that help companies better understand and take full advantage of their data sets.

This approach is quickly becoming the new normal and impacting every aspect of a company’s operations across multiple industries. In a survey of executives at 168 companies with over $500 million in sales, 76% of them are targeting higher growth via machine learning. A Harvard Business school research study with the retailer Rue La La showed that the use of machine learning techniques would improve their revenue by over 9% with nearly a 90% confidence interval. And a healthcare company used machine learning to identify predictors for avoidable hospitalizations for diabetes, a big deal not just for its positive impact on a patient’s health, but because the disease cost Americans $245 billion in 2012.

The advent of big data has only accelerated the relevance of machine learning because of the three Vs – volume, variety and velocity – of these data sets. Companies who have seen the need for these processing platforms to handle large and diverse data sets now need to quickly answer the question of “so what does the data mean”? Machine learning proves to be an invaluable asset in this effort. With much larger data sets to inform machine-learning algorithms, software companies are using these capabilities to differentiate themselves in a wide variety of markets.

Exabeam: uses machine learning to identify anomalous behavior patterns inside a company’s network that might be predictors of insider attacks.

Sumo Logic: uses machine learning to help identify patterns in log data that would shed light on anomalies in how their infrastructure is performing.

Big data is cutting a swath through every functional part of an organization and machine learning is moving in lockstep with where these data sets are generated and processed. As a result, we’re now witnessing how machine learning use cases have evolved from the “front line” to the “back line,” helping companies focus on infrastructure domains further removed from customer-facing applications but still highly relevant to a business’ overall operations. Take storage, for example. At the recent Flash Summit in California, a research paper highlighted how machine learning can dramatically increase the life cycle of flash storage, which typically deteriorates with use. Machine learning will also impact data management processes around backup and recovery. The ability to predict RTO (recovery time objectives) and RPO (recovery point objectives) when dealing with very large data sets is highly relevant given that the cost of downtime and data loss exceeds $1.7 trillion a year.

However, machine learning has to be handled like any other highly touted and ‘new’ capability, in which companies take necessary precautions to ensure they or their customers aren’t disillusioned by the impact of the technology.

  • Human validation: machine learning is often best used in conjunction with a human touch to make sense of the predictions and to interact with customers impacted by the data to understand the necessary changes that need to be made.
  • Choice of algorithms: machine learning can be deployed using a wide variety of underlying algorithms. It’s no secret that choosing the right one(s) will be instrumental to providing the optimal results for any particular use case and a keen understanding of why a particular algorithm is chosen is critical to the success of your machine learning project.
  • Minimizing Human Bias: humans create the algorithms so it’s important to recognize and build a diversity of data science backgrounds so that your algorithms are not all cut from the same cloth.

The coming years will see a rapid intertwining of big data technologies and processes with machine learning techniques to enable more than 35 zettabytes of data generated in 2020 to deliver useful and impactful business outcomes. If it’s not impacting your organization today, you can bet that it will be in short order.