Setting up Storage Server with HDFS

This topic contains a basic procedure on how to set up Storage Server with HDFS.

Pre-requisites

Kerberos should be installed on the client and host machines.If not installed you will need to install the following in UNIX (see https://help.ubuntu.com/10.04/serverguide/kerberos.html for more information).
krb5-kdc
krb5-admin-server
Open a terminal and type the following:
sudo apt-get install krb5-kdc krb5-admin-server
Set the realm as WSO2.ORG.

Starting the Storage Server node

Follow the steps below to create a keytab with the following service principals.
1. admin/carbon.super - password: admin
2. datanode/carbon.super - password: node0
3. If you are starting a data node, add the data node principal as well.
4. The keytabs are created.
The default carbon.keytab file that comes with the Storage Server pack includes this keytab. However, in case a new datanode needs to be added, then the new datanode's service principal, which should be unique, is added to this keytab file and a new carbon.keytab file needs to be created. By default, SS is configured to start one namenode and a datanode.

Cache the principle key using the following command.
ktutil: addent -password -p <your principle> -k 1 -e <encryption algo>
The following is a sample for this:

deep@den:~$ ktutil
ktutil: addent -password -p admin/carbon.super@WSO2.ORG -k 1 -e des-cbc-md5 
Password for admin/carbon.super@WSO2.ORG: admin
ktutil: addent -password -p datanode/carbon.super@WSO2.ORG -k 1 -e des-cbc-md5 
Password for datanode/carbon.super@WSO2.ORG:datanode

Write a keytab for the service principle using the following command:
ktutil: write_kt <keytab file name>The following is a sample for this:
```
ktutil: wkt carbon.keytab
```
Copy the created keytab file to [SS_HOME]/repository/conf/etc/hadoop/keytabs/ and rename it to carbon.keytab.
Start the server with HDFS enabled.
./wso2server.sh -enable.hdfs.startup
Access the Carbon configuration menu and create a new service principal for data nodes with relevant passwords.

When the namenode starts, the user should go to the Carbon console in the namenode and create a service principal for the datanodes. datanode/carbon.super should be added, plus any other datanodes the user is willing to start.

If your name node is up and HDFS is set up properly, you will notice the following lines in your console.

11-12 16:57:36,561]  INFO {org.apache.hadoop.hdfs.server.namenode.FSNamesystem} -  fsOwner=admin/node0@WSO2.ORG
[2013-11-12 16:57:36,561]  INFO {org.apache.hadoop.hdfs.server.namenode.FSNamesystem} -  supergroup=admin
[2013-11-12 16:57:36,561]  INFO {org.apache.hadoop.hdfs.server.namenode.FSNamesystem} -  isPermissionEnabled=true
[2013-11-12 16:57:36,565]  INFO {org.apache.hadoop.hdfs.server.namenode.FSNamesystem} -  dfs.block.invalidate.limit=100
[2013-11-12 16:57:36,565]  INFO {org.apache.hadoop.hdfs.server.namenode.FSNamesystem} -  isAccessTokenEnabled=true accessKeyUpdateInterval=600 min(s), accessTokenLifetime=600 min(s)
[2013-11-12 16:57:36,571]  INFO {org.apache.hadoop.hdfs.server.namenode.FSNamesystem} -  Registered FSNamesystemStateMBean and NameNodeMXBean
[2013-11-12 16:57:36,586]  INFO {org.apache.hadoop.hdfs.server.namenode.NameNode} -  Caching file names occuring more than 10 times
[2013-11-12 16:57:36,593]  INFO {org.apache.hadoop.hdfs.server.common.Storage} -  Number of files = 1
[2013-11-12 16:57:36,596]  INFO {org.apache.hadoop.hdfs.server.common.Storage} -  Number of files under construction = 0
[2013-11-12 16:57:36,597]  INFO {org.apache.hadoop.hdfs.server.common.Storage} -  Image file of size 134 loaded in 0 seconds.
[2013-11-12 16:57:36,597]  INFO {org.apache.hadoop.hdfs.server.common.Storage} -  Edits file repository/data/hadoop/dfs/name/current/edits of size 30 edits # 2 loaded in 0 seconds.
[2013-11-12 16:57:36,598]  INFO {org.apache.hadoop.hdfs.server.common.Storage} -  Image file of size 158 saved in 0 seconds.
[2013-11-12 16:57:36,853]  INFO {org.apache.hadoop.hdfs.server.common.Storage} -  Image file of size 158 saved in 0 seconds.
[2013-11-12 16:57:37,016]  INFO {org.apache.hadoop.hdfs.server.namenode.NameCache} -  initialized with 0 entries 0 lookups
[2013-11-12 16:57:37,016]  INFO {org.apache.hadoop.hdfs.server.namenode.FSNamesystem} -  Finished loading FSImage in 456 msecs
[2013-11-12 16:57:37,023]  INFO {org.apache.hadoop.hdfs.server.namenode.FSNamesystem} -  Total number of blocks = 0
[2013-11-12 16:57:37,024]  INFO {org.apache.hadoop.hdfs.server.namenode.FSNamesystem} -  Number of invalid blocks = 0
[2013-11-12 16:57:37,024]  INFO {org.apache.hadoop.hdfs.server.namenode.FSNamesystem} -  Number of under-replicated blocks = 0
[2013-11-12 16:57:37,024]  INFO {org.apache.hadoop.hdfs.server.namenode.FSNamesystem} -  Number of  over-replicated blocks = 0

If data node needs to be up, open another terminal, and run the following command.
$ HADOOP_SECURE_DN_USER=<username> sudo -E bin/hadoop datanode

Starting multiple Data nodes pointing to one Namenode

Change the following property values in the hdfs-site.xml file to point to the namenode:
```
dfs.http.address
dfs.https.port
dfs.https.address
```
Change the following properties in the core-site.xml file to point to the namenode:
```
fs.default.name
hadoop.security.group.mapping.service.url
```

Change the following in the hdfs-site.xml file to start the datanode.

dfs.datanode.address
dfs.datanode.https.address
dfs.datanode.http.address
dfs.datanode.ipc.address
dfs.replication

Add the datanode IP and port to the slaves, one per line.
When starting multiple datanodes in the same machine, make sure you change the PID_DIR and the IDENT_STRING for the data node in the hadoop-env.shfile.
- If starting as a secure datanode then add the following line:
```
# The directory where pid files are stored for secured datanode
export HADOOP_SECURE_DN_PID_DIR=/tmp/2
```
- Alternatively add the following:
```
# A string representing this instance of hadoop. $USER by default.
export HADOOP_IDENT_STRING=$USER_02
```