How To Setup Single Node BigData Hadoop Cluster

How to Setup Single Node (Pseudo Distributed Node) Hadoop Cluster

Step 1:

Get hadoop rpm from apache site, search on google “apache hadoop download”

http://www.apache.org/dyn/closer.cgi/hadoop/common/

in LinuxWorld Lab, run

# yum install hadoop

step 2:

Get java rpm from oracle site , search on google “jdk download”

http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html

in LinuxWorld Lab, run

# yum install jdk

step 3:

[root@server Desktop]# rpm –ql jdk | grep java$

/etc/.java

/usr/java

/usr/java/jdk1.7.0_51/bin/java

/usr/java/jdk1.7.0_51/jre/bin/java

[root@server Desktop]# /usr/java/jdk1.7.0_51/bin/java -version

java version “1.7.0_51″

Java(TM) SE Runtime Environment (build 1.7.0_51-b13)

Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)

[root@server Desktop]# java -version

java version “1.7.0_09-icedtea”

OpenJDK Runtime Environment (rhel-2.3.4.1.el6_3-x86_64)

OpenJDK 64-Bit Server VM (build 23.2-b09, mixed mode)

[root@server Desktop]# echo $JAVA_HOME

/usr

[root@server Desktop]# JAVA_HOME=/usr/java/jdk1.7.0_51/

[root@server Desktop]# echo $JAVA_HOME

/usr/java/jdk1.7.0_51/

[root@server Desktop]# java -version

java version “1.7.0_09-icedtea”

OpenJDK Runtime Environment (rhel-2.3.4.1.el6_3-x86_64)

OpenJDK 64-Bit Server VM (build 23.2-b09, mixed mode)

[root@server Desktop]# PATH=$JAVA_HOME/bin:$PATH

Note: $JAVA_HOME must be put first then $PATH in above cmd

[root@server Desktop]# java -version

java version “1.7.0_51″

Java(TM) SE Runtime Environment (build 1.7.0_51-b13)

Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)

Step 4:

[root@server Desktop]# vim /root/.bash_profile

export JAVA_HOME=/usr/java/jdk1.7.0_51/

PATH=$JAVA_HOME/bin:$PATH

[root@server Desktop]# . /root/.bash_profile

Step 5:

Hadoop is by default setup but we need to configure java path on its internal file

[root@localhost /]# vim /etc/hadoop/hadoop-env.sh

export JAVA_HOME=/usr/java/jdk1.7.0_51/

# The maximum amount of heap to use, in MB. Default is 1000.

export HADOOP_HEAPSIZE=500

to test it is working, run below cmd

[root@localhost /]# hadoop fs -ls /

Step 6: Setup HDFS name and data node

[root@server hadoop]# vim /etc/hadoop/hdfs-site.xml

<configuration>

<property>

<name>dfs.name.dir</name>

<value>/data/nodename</value>

<final>true</final>

</property>

<property>

<name>dfs.data.dir</name>

<value>/data/dataname</value>

<final>true</final>

</property>

<property>

<name>dfs.replication</name>

<value>3</value>

<final>true</final>

</property>

<property>

<name>dfs.block.size</name>

<value>134217728</value>

<final>true</final>

</property>

</configuration>

note: above directory automatically created, no need to create before

root@server hadoop]# hadoop namenode –format

Step 7: To start name and data node

[root@server hadoop]# vim /etc/hadoop/core-site.xml

<configuration>

<property>

<name>fs.default.name</name>

<value>hdfs://127.0.0.1:10001</value>

</property>

<property>

<name>hadoop.tmp.dir</name>

<value>/usr/local/hadoop/tmp</value>

</property>

</configuration>

[root@server hadoop]# hadoop-daemon.sh start namenode

above cmd start some port, run

#netstat -tnlp | grep java

tcp 0 0 127.0.0.1:10001 0.0.0.0:* LISTEN 14969/java

tcp 0 0 0.0.0.0:50070 0.0.0.0:* LISTEN 14969/java

[root@server hadoop]# hadoop-daemon.sh start datanode

above cmd start some port, run

#netstat -tnlp | grep java

tcp 0 0 0.0.0.0:50010 0.0.0.0:* LISTEN 15093/java

tcp 0 0 0.0.0.0:50075 0.0.0.0:* LISTEN 15093/java

To verify:

[root@server hadoop]# jps

8177 Jps

8126 DataNode

7933 NameNode

Or go to url, as “50070” is name node management port

http://127.0.0.1:50070

in CLI,we can also see the report

[root@server hadoop]# hadoop dfsadmin –report

You can check hadoop hdfs filesytem,initially there is nothing

# hadoop fs -ls /

Create directory in hdfs filesystem

# hadoop fs -mkdir /input

Upload or copy local file into hdfs filesystem

# hadoop fs -copyFromLocal test.txt /input

Note : it uploaded to datanode at the storage folder named “current” in distributed fashion of maximum file size “64MB” as bcoz by default block size is 64MB

You can change block size in hdfs-site.xml

<property>

<name>dfs.block.size</name>

<value>134217728</value>

<final>true</final>

</property>

Bydefault it copy to 3 datanode, as by default replication is 3, you can change it in hdfs-site.xml

<property>

<name>dfs.replication</name>

<value>2</value>

</property>

List file in hdfs

# hadoop fs -ls /input

# hadoop fs -lsr /

How to Setup Map Reduce

Step 1:

Setup Mapred-site.xml file:

# vim /etc/hadoop/mapred-site.xml

<configuration>

<property>

<name>mapred.job.tracker</name>

<value>192.168.0.16:9001</value>

</property>

</configuration>

Step 2: start jobtracker

# hadoop-daemon.sh start jobtracker

starting jobtracker, logging to /var/log/hadoop/root/hadoop-root-jobtracker-desktop16.example.com.out

# jps

7247 JobTracker

6467 DataNode

6541 NameNode

7325 Jps

Note: it start 2 new port, check

# netstat -tnlp | grep java

tcp 0 0 0.0.0.0:50030 0.0.0.0:* LISTEN 7411/java

tcp 0 0 192.168.0.16:9001 0.0.0.0:* LISTEN 7411/java

Where, 50030 is management port for mapreduce,

Check it : http://127.0.0.1:50030

Step 3: start tasktracker

# hadoop-daemon.sh start tasktracker

starting tasktracker, logging to /var/log/hadoop/root/hadoop-root-tasktracker-desktop16.example.com.out

# jps

7639 Jps

7569 TaskTracker

6467 DataNode

7411 JobTracker

6541 NameNode

Step 4:Test your setup, by run example file from hadoop rpm

You can get it here

# rpm -ql hadoop | grep examples

/usr/share/hadoop/hadoop-examples-1.2.1.jar

# hadoop jar /usr/share/hadoop/hadoop-examples-1.2.1.jar wordcount /input /output

14/05/07 14:38:01 INFO input.FileInputFormat: Total input paths to process : 1

14/05/07 14:38:01 INFO util.NativeCodeLoader: Loaded the native-hadoop library

14/05/07 14:38:01 WARN snappy.LoadSnappy: Snappy native library not loaded

14/05/07 14:38:02 INFO mapred.JobClient: Running job: job_201405071431_0001

14/05/07 14:38:03 INFO mapred.JobClient: map 0% reduce 0%

14/05/07 14:38:12 INFO mapred.JobClient: map 100% reduce 0%

14/05/07 14:38:20 INFO mapred.JobClient: map 100% reduce 33%

14/05/07 14:38:21 INFO mapred.JobClient: map 100% reduce 100%

14/05/07 14:38:22 INFO mapred.JobClient: Job complete: job_201405071431_0001

# hadoop job -list all

1 jobs submitted

States are:

Running : 1 Succeded : 2 Failed : 3 Prep : 4

JobId State StartTime UserName Priority SchedulingInfo

job_201405071431_0001 2 1399453681859 root NORMAL NA

# hadoop fs -ls /output

Found 3 items

-rw-r–r– 3 root supergroup 0 2014-05-07 14:38 /output/_SUCCESS

drwxr-xr-x – root supergroup 0 2014-05-07 14:38 /output/_logs

-rw-r–r– 3 root supergroup 34 2014-05-07 14:38 /output/part-r-00000

Note: _SUCCESS file created means map reduce is successfully done

Note: part-r-00000 contains output of reducer , final output

You can see the final output of map reduce job

# hadoop fs -cat /output/part-r-00000

And we can also see by web UI

http://127.0.0.1:50070 -> Browse the filesystem

if you want to list complete details of running or completed job, then use job id with status option

# hadoop job -status job_201405071431_0004

Job: job_201405071431_0004

file: hdfs://192.168.0.16:10001/tmp/hadoop-root/mapred/staging/root/.staging/job_201405071431_0004/job.xml

tracking URL: http://desktop16.example.com:50030/jobdetails.jsp?jobid=job_201405071431_0004

map() completion: 0.017579561

reduce() completion: 0.0

Counters: 3

Job Counters

SLOTS_MILLIS_MAPS=2481

Launched map tasks=2

Data-local map tasks=2

# hadoop job -list

1 jobs currently running

JobId State StartTime UserName Priority SchedulingInfo

job_201405071431_0004 1 1399486864042 root NORMAL NA

if you want to kill running process

# hadoop job -kill job_201405071431_0004

Killed job job_201405071431_0004

If you want to change the priority of one job over other

# hadoop job -set-priority job_201405071431_0004 LOW

Changed job priority.

You can list by below command

#hadoop job -list all

4 jobs submitted

States are:

Running : 1 Succeded : 2 Failed : 3 Prep : 4

JobId State StartTime UserName Priority SchedulingInfo

job_201405071431_0001 2 1399453681859 root NORMAL NA

job_201405071431_0002 3 1399462379102 root NORMAL NA

job_201405071431_0003 2 1399462502071 root NORMAL NA

job_201405071431_0004 2 1399486864042 root LOW NA

BY default FIFO scheduler is used in “Apache Hadoop”

We can change to “Fair Scheduler” in mapred-site.xml file

Step 1:

# vim /etc/hadoop/mapred-site.xml

Step 2:

In mapred-site.xml of job tracker,specify the scheduler used :

<property>

<name>mapred.jobtracker.taskScheduler</name>

<value>org.apache.hadoop.mapred.FairScheduler</value>

</property>

Identify the pool configuration file :

<property>

<name>mapred.fairscheduler.allocation.file</name>

<value>/etc/hadoop/fair-scheduler.xml</value>

</property>

Step 3:

# vim /etc/hadoop/fair-scheduler.xml

<allocations>

<pool name=”tech”>

<minMaps>10</minMaps>

<minReduces>5</minReduces>

<maxRunningJobs>2</maxRunningJobs>

</pool>

<pool name=”hr”>

<minMaps>10</minMaps>

<minReduces>5</minReduces>

</pool>

<user name=”vimal”>

<maxRunningJobs>2</maxRunningJobs>

</user>

</allocations>

Step 4:

# hadoop-daemon.sh stop jobtracker

stopping jobtracker

# hadoop-daemon.sh start jobtracker

starting jobtracker, logging to /var/log/hadoop/root/hadoop-root-jobtracker-desktop16.example.com.out

Step 5:

Run job with pool name “tech”

# hadoop jar /usr/share/hadoop/hadoop-examples-1.2.1.jar wordcount -Dpool.name=tech /input /output3

How To Setup Single Node BigData Hadoop Cluster

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112