hadoop-2.0.0-alpha (Standalone) and jdk7u4 on CentOS 6.2

May 27, 2012

hadoop-2.0.0-alpha (Standalone) and jdk7u4 on CentOS 6.2

Hardware:
Dell PowerEdge SC420 Pentium 4GB RAM 80GB HDD

OS:
CentOS 6.2
Download from,
ftp://mirror.nandomedia.com/pub/CentOS/6.2/isos/i386/CentOS-6.2-i386-LiveCD.iso

Burn to a CD.
Boot from this CD.
Brings up CentOS 6.2.
Click on 'Install onto Hard Disk Drive', from desktop.
Reboot.
Login as 'root'.

Software:
Download Java from,
http://download.oracle.com/otn-pub/java/jdk/7u4-b20/jdk-7u4-linux-i586.rpm

# rpm -ivh {download folder}/jdk-7u4-linux-i586.rpm

installs java at /usr/java/jdk1.7.0_04

# java -version

java version "1.7.0_04"
Java(TM) SE Runtime Environment (build 1.7.0_04-b20)
Java HotSpot(TM) Client VM (build 23.0-b21, mixed mode, sharing)

# cd
# pwd
/root
# vi .bashrc
append the following two lines at the end.

export JAVA_HOME=/usr/java/jdk1.7.0_04
export PATH=${JAVA_HOME}/bin:${PATH}

Press keys, Esc : wq
# . ~/.bashrc

Download hadoop from,
http://apache.mirrorcatalogs.com/hadoop/common/hadoop-2.0.0-alpha/hadoop-2.0.0-alpha.tar.gz

# cd /usr
# tar -xvzf {download folder}/hadoop-2.0.0-alpha.tar.gz

# ls /usr/hadoop-2.0.0-alpha
bin include lib      LICENSE.txt output      sbin
etc input    libexec NOTICE.txt   README.txt share

# service sshd status
openssh-daemon (pid xxxx) is running...

if not running, # service sshd restart

# groupadd hadoop
# useradd -g hadoop -m hadoop
# cd /usr/hadoop-2.0.0-alpha
# chown -R hadoop:hadoop
# su - hadoop
# ssh-keygen -t rsa -P ''
hit Enter when prompted for a folder
# cat .ssh/id_rsa.pub >> .ssh/authorized_keys
# ssh localhost
would log you without prompting for a password

# exit
# whoami
hadoop

# bin/hadoop version
Hadoop 2.0.0-alpha
Subversion http://svn.apache.org/repos/asf/hadoop/common/branches/branch-2.0.0-alpha/hadoop-common-project/hadoop-common -r 1338348
Compiled by hortonmu on Wed May 16 01:28:50 UTC 2012
From source with checksum 954e3f6c91d058b06b1e81a02813303f

# mkdir input

# cp share/hadoop/common/templates/conf/*.xml input

# bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.0-alpha.jar grep input output 'dfs[a-z.]+'

...
INFO input.FileInputFormat: Total input paths to process : 6
...
...
12/05/27 03:38:30 INFO mapreduce.Job: Counters: 27
    File System Counters
        FILE: Number of bytes read=91368
        FILE: Number of bytes written=505456
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
    Map-Reduce Framework
        Map input records=32
        Map output records=32
        Map output bytes=1006
        Map output materialized bytes=1076
        Input split bytes=133
        Combine input records=0
        Combine output records=0
        Reduce input groups=3
        Reduce shuffle bytes=0
        Reduce input records=32
        Reduce output records=32
        Spilled Records=64
        Shuffled Maps =0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=50
        CPU time spent (ms)=0
        Physical memory (bytes) snapshot=0
        Virtual memory (bytes) snapshot=0
        Total committed heap usage (bytes)=242368512
    File Input Format Counters
        Bytes Read=1368
    File Output Format Counters
        Bytes Written=830

# ls input/
capacity-scheduler.xml core-site.xml hadoop-policy.xml hdfs-site.xml mapred-queue-acls.xml mapred-site.xml
{matching the, INFO input.FileInputFormat: Total input paths to process : 6}

# cat output/part-r-00000
3    dfs.datanode.data.dir
2    dfs.namenode.http
2    dfs.namenode.https
1    dfsqa
1    dfsadmin
1    dfs.webhdfs.enabled
1    dfs.umaskmode
1    dfs.support.append
1    dfs.secondary.namenode.keytab.file
1    dfs.secondary.namenode.kerberos.principal
1    dfs.secondary.namenode.kerberos.https.principal
1    dfs.permissions.superusergroup
1    dfs.namenode.secondary.http
1    dfs.namenode.safemode.threshold
1    dfs.namenode.replication.min.
1    dfs.namenode.name.dir
1    dfs.namenode.keytab.file
1    dfs.namenode.kerberos.principal
1    dfs.namenode.kerberos.https.principal
1    dfs.include
1    dfs.https.port
1    dfs.hosts.exclude
1    dfs.hosts
1    dfs.exclude
1    dfs.datanode.keytab.file
1    dfs.datanode.kerberos.principal
1    dfs.datanode.http.address
1    dfs.datanode.data.dir.perm
1    dfs.datanode.address
1    dfs.cluster.administrators
1    dfs.block.access.token.enable
1    dfs

{matching the, Reduce output records=32}

Done, quick install and verify of hadoop 2.0.0 alpha standalone.

Refer to,
http://hadoop.apache.org/common/docs/stable/single_node_setup.html

Search This Blog

hadoop-2.0.0-alpha (Standalone) and jdk7u4 on CentOS 6.2

Popular Posts

Why & How Division !?

ggplot2 - I wish a GIANT DISPLAY !