Have you ever handled or seen massive amount of data !! There is one open source software that handles massive(Big) data every day,Ya! That is hadoop -The elephant.
Hadoop is the open source framework for handling massive amount of data.It handles more peta bytes of data per day.
This Hadoop framework was introduced in the year of 1994 by Doug Cutting who worked at yahoo ! at that time.
He implemented the Google's Map Reduce paper.He named that work as Hadoop -The name of toy elephant of his son.
Hadoop uses the map reduce technique.
The work is distributed across multiple machines in the cluster.The main difference between the Grid and the Hadoop is In grid computing the process are always running and the data is allocated to the process.
But in the hadoop the data is allocated and the process is started.
The data is distributed across multiple machines so there is one name node that used to keep track of the which data handled by which machine.This keeps track of the data that distributed across the machines.
The architecture of hadoop as follows
The file system is HDFS- Hadoop data file system used in the Hadoop
It splits the data in to blocks and stores it in the different machines.The data is replicated so even in case of failures the data is available.
Applications of Hadoop
- Log and/or clickstream analysis of various kinds
- Marketing analytics
- Machine learning and/or sophisticated data mining
- Image processing
- Processing of XML messages
- Web crawling and/or text processing
- General archiving, including of relational/tabular data, e.g. for compliance
Users of Hadoop around the Globe
TCS
CTS
Amazon
e-bay
Akamai
yahoo
IBM
Microsoft
etc.,
No comments:
Post a Comment