Sathish's Blog: Hadoop The Power Of Elephant !

Have you ever handled or seen massive amount of data !! There is one open source software that handles massive(Big) data every day,Ya! That is hadoop -The elephant.

Hadoop is the open source framework for handling massive amount of data.It handles more peta bytes of data per day.

This Hadoop framework was introduced in the year of 1994 by Doug Cutting who worked at yahoo ! at that time.

He implemented the Google's Map Reduce paper.He named that work as Hadoop -The name of toy elephant of his son.

Hadoop uses the map reduce technique.

The work is distributed across multiple machines in the cluster.The main difference between the Grid and the Hadoop is In grid computing the process are always running and the data is allocated to the process.

But in the hadoop the data is allocated and the process is started.

The data is distributed across multiple machines so there is one name node that used to keep track of the which data handled by which machine.This keeps track of the data that distributed across the machines.

The architecture of hadoop as follows

Hadoop cluster

The file system is HDFS- Hadoop data file system used in the Hadoop

It splits the data in to blocks and stores it in the different machines.The data is replicated so even in case of failures the data is available.

Applications of Hadoop

Log and/or clickstream analysis of various kinds
Marketing analytics
Machine learning and/or sophisticated data mining
Image processing
Processing of XML messages
Web crawling and/or text processing
General archiving, including of relational/tabular data, e.g. for compliance

Users of Hadoop around the Globe

TCS

CTS

Amazon

e-bay

Akamai

yahoo

Google

IBM

Microsoft

etc.,

Sathish's Blog

Thursday, 27 December 2012

Hadoop The Power Of Elephant !

No comments:

Post a Comment

Total Pageviews

About Me