Known Issues in Hadoop Cluster

Given below are issues you might get when setting up a multi node Hadoop cluster, and how to resolve them:

Starting and stopping a Hadoop cluster

Go to the node where namenode is installed. Execute $HADOOP_HOME/bin/start-all.sh. HADOOP_HOME is the location where Hadoop was installed.
In order to check whether the expected Hadoop processes are running in a node, jps (part of Sun’s Java since v1.5.0) command can be used.

The output will look like this:

hadoop@ubuntu:/usr/local/hadoop$ jps
2287 TaskTracker
2149 JobTracker
1788 NameNode

Execute $HADOOP_HOME/bin/stop-all.sh to stop all the nodes in the cluster. This command should be issued from the node where cluster was started.
If there are any errors, examine the log files in the HADOOP_HOME/logs/ directory.

Namenode is in safe mode

The following error comes up when the Namenode is safe mode:

org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete /tmp/hadoop-hadoop/mapred/system. Name node is in safe mode.
The reported blocks 319128 needs additional 7183 blocks to reach the threshold 0.9990 of total blocks 326638. 
Safe mode will be turned off automatically.

NameNode is in safemode until configured percent of blocks reported to be online by the data nodes. It can be configured by parameter dfs.namenode.safemode.threshold-pct in the hdfs-site.xml For small / development clusters, where you have very few blocks - it makes sense to make this parameter lower then its default 0.9999f value. Otherwise 1 missing block can lead to system to hang in safemode.

If this error persists, you can let the Namenode forcefully leave safe mode by issuing following command.

HADOOP_HOME/bin/hadoop dfsadmin -safemode leave

Namenode is not getting started

Sometimes Namenode fails to start if it's data directories have been deleted or corrupted. Usually these directories are configured by dfs.name.dir and dfs.data.dir properties in HADOOP_HOME/conf/hdfs-site.xml. Make sure those directories are readable and writable for Hadoop user.

If the issue persists,

Delete all contents from dfs.name.dir and dfs.data.dir
Format the namenode: bin/hadoop namenode -format WARNING: all HDFS data is lost during this process!
Start all processes again:bin/start-all.sh

If dfs.name.dir and dfs.data.dir has not been configured, Hadoop creates them in Hadoop temporary directory. That directory is defaulted to /tmp/hadoop-${user.name} which is cleaned after every reboot. So if namenode data is created inside /tmp, Namenode will fail to start after a node restart.

Datanode is not getting started - java.io.IOException: Incompatible namespaceIDs

If the error “java.io.IOException: Incompatible namespaceIDs” has been observed in the logs of a DataNode (logs/hadoop-hadoop-datanode-.log), sometimes you might have affected by issue HDFS-107 (formerly known as HADOOP-1212). Due to this error, Datanode fails to start.

This step fixes the problem at the cost of erasing all existing data in the cluster’s HDFS file system.

Stop the full cluster, i.e. both MapReduce and HDFS layers.
Delete the data directory on the problematic DataNode: the directory is specified by dfs.data.dir in conf/hdfs-site.xml.
Reformat the NameNode. WARNING: all HDFS data is lost during this process!
Restart the cluster.