Home > Big Data and Hadoop > Fortune 500 companies using Hadoop

Fortune 500 companies using Hadoop

When we think Wall Street we typically think of companies like Morgan Stanley, JPMC, UBS and others.  Ever wondered how they process big data and what tools they use to do so?

Morgan Stanley: When faced with the problem of identifying triggers that initiated market events, they looked at vast quantities of web and database logs and decided to put the log data into Hadoop and run some time based correlations on the data.  The resulting system now provides a way to go back in time and identify which application or user initiated the transaction that culminated in the market event of interest.

JPMC: As a financial services company with over 150 PB of online data, 30,000 databases, 3.5 billion logins to user accounts, JPMC was faced with the task of reducing fraud, managing IT risk and mining data for customer insights.  To do this they turned to Hadoop which now gave them a single platform to store all the data making it easier to query for insights.

If we consider the world of web2.0 companies, auction house eBay and travel site Orbitz may come to mind.

eBay: As a massive online presence eBay has over 97 million active buyers and sellers, over 200 million items for sale in over 50,000 categories.  This translates into 10 TB or more of incoming data per day.  When tasked with finding a way to solve real time problems by crunching predictive models they turned to Hadoop and built a 500-node Hadoop cluster using Sun servers running Linux.  As time went by a need arose to create a better real-time search engine for the auction site.  This is now being built using Hadoop and HBase (To get more details I recommend you do a search under project “Cassini”).

Orbitz: Orbitz the online travel vendor had a need to determine metrics like “how long does it take for a user to download a page?”.  When Orbitz developers needed to understand why production systems had issues they needed a way to mine huge volumes of production log data.  The solution they implemented uses a combination of Hadoop and Hive to process weblogs which are then further processed using scripts written in R (open source statistical package which supports visualization) to derive useful metrics related to hotel bookings and user ratings.  Hadoop ended up complementing their existing data warehouse systems instead of replacing them.

In the brick-and-mortar world of retailers there are companies like Walmart, Kmart, Target and Sears.  Consider how Sears uses Hadoop for big data:

Sears: When faced with a need to evaluate the results of various marketing campaigns among other needs, Sears appears to have entered the Hadoop world in a big way with a 300-node Hadoop cluster storing and processing 2 PB of data.  More recently they have begun using Hadoop to set pricing based on variables like availability of a product in a store, what a competitor would charge for a similar product, what economic conditions exist in that area.  In addition their big data system allows them to send customized coupons to consumers by location, for instance if you are in New York and hit by Hurricane Sandy it is useful to receive Sears coupons for generators, bleach and other survival tools.  In an industry dominated by etailers like Amazon, the ability to set and change prices dynamically, court loyalty program consumers with customized offers are some  of the many ways that brick-and-mortar firms like Sears are trying to stay relevant.

In conclusion the takeways would be that Hadoop complements your existing data warehouse, it provides an open source scalable way to store PB of log data but you still need to build a solution that is tailored to solve your specific needs whether they be sentiment analysis, customer satisfaction metrics or judging the effectiveness of your marketing campaigns.  Hadoop is a tool but like any good tool it doesn’t offer you a panacea nor can it be used in isolation.

  1. No comments yet.
  1. No trackbacks yet.

Leave a comment