Saturday, 29 December 2012

Hadoop installation procedure ..

Hadoop Architecture:

Hadoop is a powerful software for handling more petabytes of data.This work with clusters of computers.
Hadoop assigns the data to those systems which is in the cluster and shedules its jobs.This scheduling operation is performed by the "job tracker" in the hadoop architecture.

The task tracker will monitor the jobs that are performed by different systems in the cluster.
These 2 are comes under the Map Reduce layer.

The name node in the hadoop architecture is used for assigning the jobs to the computers that are available in the cluster.There is one secondary name node is used for assigning the jobs when the primary naming node is down.This will check the availability of the naming node each and every 5 seconds.

Requirements:
Oracle java 6 (jdk 1.6 )and above.
ubuntu 10.04
SSH
Installing Java in ubuntu:
# Add the Ferramosca Roberto's repository to your apt repositories
# See https://launchpad.net/~ferramroberto/
#
$ sudo apt-get install python-software-properties
$ sudo add-apt-repository ppa:ferramroberto/java

# Update the source list
$ sudo apt-get update

# Install Sun Java 6 JDK
$ sudo apt-get install sun-java6-jdk

# Select Sun's Java as the default on your machine.
# See 'sudo update-alternatives --config java' for more information.
#
$ sudo update-java-alternatives -s java-6-sun
The full JDK which will be placed in /usr/lib/jvm/java-6-sun 
$ sudo apt-get install sun-java6-jdk
is the installation command for the java jdk in ubuntu.
To check the installation type
user@ubuntu:~# java -version
To create a dedicated user to in the linux system for using hadoop the following commands are used.
This is for seperating other applications from the hadoop process(security,access rights ,etc.,)
$ sudo addgroup hadoop
$ sudo adduser --ingroup hadoop hduser

This will add the in the ubuntu machine then you need to log in to that account.

Then you need to configure the ssh .It will generate the private and public keys for the nodes to communicate.

The following commands are used to configure and generate the private and public key pairs for communication.

 

hduser@ubuntu:~$ ssh-keygen -t rsa -P ""
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hduser/.ssh/id_rsa):
Created directory '/home/hduser/.ssh'.
Your identification has been saved in /home/hduser/.ssh/id_rsa.
Your public key has been saved in /home/hduser/.ssh/id_rsa.pub.
The key fingerprint is:
9b:82:ea:58:b4:e0:35:d7:ff:19:66:a6:ef:ae:0e:d2 hduser@ubuntu
The key's randomart image is:
[...snipp...]

the command will generate the private and public key pairs.

 

Second, you have to enable SSH access to your local machine with this newly created key.

hduser@ubuntu:~$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

The final step is to test the SSH setup by connecting to your local machine with the hduser user. The step is also needed to save your local machine’s host key fingerprint to the hduser user’s known_hosts file. If you have any special SSH configuration for your local machine like a non-standard SSH port, you can define host-specific SSH options in $HOME/.ssh/config (see man ssh_config for more information).

hduser@ubuntu:~$ ssh localhost
The authenticity of host 'localhost (::1)' can't be established.
RSA key fingerprint is d7:87:25:47:ae:02:00:eb:1d:75:4f:bb:44:f9:36:26.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (RSA) to the list of known hosts.
Linux ubuntu 2.6.32-22-generic #33-Ubuntu SMP Wed Apr 28 13:27:30 UTC 2010 i686 GNU/Linux
Ubuntu 10.04 LTS
[...snipp...]
hduser@ubuntu:~$

 

 

No comments:

Post a Comment