Back

 Industry News Details

 
The Role of Hadoop in Digital Transformations and Managing the IoT Posted on : Sep 14 - 2016

The digital transformation underway at Under Armour is erasing any stale stereotypes that athletes and techies don’t mix. While hardcore runners sporting the company’s latest microthread singlet can’t see Hadoop, Apache Hive, Apache Spark, or Presto, these technologies are teaming up to track some serious mileage.

Under Armour is working on a “connected fitness” vision that connects body, apparel, activity level, and health. By combining the data from all these sources into an app, consumers will gain a better understanding of their health and fitness, and Under Armour will be able to identify and respond to customer needs more quickly with personalized services and products. The company stores and analyzes data about food and nutrition, recipes, workout activities, music, sleep patterns, purchase histories, and more.

Compiling, storing, and analyzing these types of structured and unstructured data at this scale would have been nearly impossible a decade ago. Today, companies can use Hadoop to merge their data from business applications, business analytics, web logs, the Internet of Things (IoT), and many other sources to deliver context-relevant insights. When companies collect data from all sources to augment the core of their business, they often realize real-time business insights that give them a competitive edge.

 

Over the years, companies have invested significant amounts of time and money on untangling data schemas and making data consistent. The end goal was always to have more visibility and business insight – and to gain access to a greater portion of their own valuable business data as well as customer and partner data. Within enterprises, 60 to 73 percent of data is never used for business intelligence, analytics, or applications. An integrated, Hadoop-based data lake with integrated business systems has the potential to reduce those percentages significantly and give companies access to valuable big data signals.

Hadoop Unfazed by Size or Schematic

Hadoop doesn’t need to enforce schema to store data, and it can store and process very large sets of structured and unstructured data. For enterprises, the unstructured data is especially intriguing, as images, video, audio, and social media are taking over the digital universe and greatly outpacing the growth of structured data. When business systems are integrated with Hadoop data lakes and business data, they have a 360-degree view of what is happening in the business.

Companies using Hadoop are storing data in thousands of nodes, and they can process that data more efficiently by implementing various SQL-based or MapReduce distributed compute frameworks. The open-source Apache Software Foundation has also opened the door to integrating multiple emerging data-processing frameworks so that all types of data can be analyzed and mined for business insights.

At Under Armour, an analytics data warehouse, SQL-based big data processing engine, and machine-learning engine work together to provide business and user insights, personalized recommendations, search enhancements, and data access, but the data innovations won’t end there.

Merging Contextual and Business Data

Several Apache projects with Hadoop have defined a flexible framework that can be integrated with machine-learning tools or deep-learning libraries, which enhances digital image detection and recognition. A huge library of product images, for example, can be processed quickly or individuals in crowds can be identified automatically.

Retailers can boost sales by relating pictures of their products with a customer’s past shopping preferences. In addition, they can merge sentiment analytics and track all the steps in a customer journey, including competitive offerings and prospect behavior. By tracking social media, website activity, and call-center data after a product or service launch, companies easily understand which products are successful based on consumer posts and customer feedback. Understanding of why people make purchases – and why they don’t – has always drained out of the customer journey like water through a sieve. The ability to iteratively discover relevant big data signals enables organizations to track contextual information around consumer behavior.

Hadoop also is helping manufacturers improve real-time quality assurance on the production line. Manufacturers can photograph images of goods as they are being assembled and, using image recognition, automatically inspect the product to see if it meets the factory standards. In many cases, the automated image recognition is more accurate than human review.

Companies also are relying on a new breed of distributed computing technologies to merge business data from ERP, HR, finance, sales, and inventory with operational information like equipment and installation maintenance. Turkish Airlines, for example, has started a program to track its flight operations with equipment and maintenance, procurement, and parts purchasing, plus lining up crew to make repairs. Automating these processes ensures that repairs happen before customers experience a service interruption.

Handling the IoT with Hadoop

Maintenance and quality assurance are among the many areas where organizations will want to apply the IoT, and having Hadoop is a real advantage. CenterPoint Energy in Houston relies on Hadoop to reduce the storage costs of the data it collects from more than 2.3 million customers. Every 15 minutes, CenterPoint collects energy-usage reporting from smart meters, which means the company is processing more than 5 billion records. As the IoT becomes more prevalent, storage and computing requirements will only increase.

Looking ahead, the IoT will produce vast amounts of data as billions of sensors transmit information multiple times a day. In many IoT scenarios, much of the data will be the same information. Let’s say a sensor is in a washing machine and tracks vibrations, temperature, and time of use. For months, the data transmitted will never change, but when it moves beyond acceptable thresholds, big data solutions must efficiently store and process the time-series data and deliver these abnormal signals in real time so that a chain of appropriate responses that will prevent equipment failure and service interruptions can be initiated. View More