Share, , Google Plus, Pinterest,


Posted in:

Streamlining Hadoop Data for handy visualization

In the last data revolution of data-warehouses, organizations captured humongous amounts of data and are now struggling to handle it or find value out of it. Hadoop has been a big time enabler in letting these organizations bring value to this data. But, Hadoop is still quite complicated and its results have significant imperceptibility due to their crude form which are not friendly to the eye. This entails a large number of layers on top of Hadoop to make the data in desirable form and then visualize it, leading to design complications, dependency on expensive resources and restrictions in scalability and readiness. If Hadoop is not used, good and striking visualization of data can be accomplished but on a limited data-set, restricted by many factors like the local system’s memory and processing capabilities. As of now, people have to trade-off between the visualization quality, data size or speed.

There have been variety of efforts done towards solving this problem. Once of it is by Arcadia Data that provides native visualization capability on Hadoop data, providing functionalities like drag and drop of files, interactive data selection, etc. This will enable users to have the best of all worlds – better visualization, scalability and speed. Tableau connects directly to Hive, enabling users to extract data directly, though not natively, from Hadoop eco-system and further use it for analysis and visualization.

Platfora, rather, has created one comprehensive layer over Hadoop to enable end-to-end data processing capabilities for analytics. Atscale also creates readiness out of Hadoop data by making it available in format directly consumable by any BI tool of choice. It optimizes query in real time, enabling a quick execution over complete set of data. Jethrodata introduces a complete indexing methodology wherein select data is fully indexed so as to give superior execution time for queries on complete data-sets.

These accomplishments will, hopefully, bring big data closer to the wider base of its intended audience. Which solution shall be adopted in a specific case will be another discussion and will hover around the lower level details of the problem. But whichever suits most, it will mitigate the challenge of ‘change’ which today’s businesses face continually. Besides an ease, these developments will further call for a potential new range of products which would give out-of-the-box and domain specific functionalities harnessing Hadoop eco-systems as their information base. Such products can specifically focus on generic business challenges that need comprehensive and large data sets for solutions.