Running Service Statistics with DataStax

This guide describes how to integrate BAM Service Statistics Toolbox with Cassandra and Hive running in DataStax Enterprise Edition 3.0.0.

Setting up DataStax

We use the H2 database in this example. If you use any other database to connect, the first two steps below should be executed for that database's connector driver jar (e.g., ojdbc driver) instead of H2.

Drop h2-database-engine_1.2.140.wso2v3.jar and hive-jdbc-handler-0.8.1-wso2v6.jar to <DSE_HOME>/resources/hive/lib folder where <DSE_HOME> is the DataStax installation directory.
Add the following property to <DSE_HOME>/resources/hive/conf/hive-site.xml file.
```
<property>
  <name>hive.aux.jars.path</name>  
  <value>$DSE_HOME/resources/dse/lib/hive-jdbc-handler-0.8.1-wso2v6.jar,$DSE_HOME/resources/dse/lib/h2-database-engine_1.2.140.wso2v3.jar</value>
</property>
```
Be sure to replace $DSE_HOME with the actual file path of the installation. File path should start with file:///. For example, if $DSE_HOME is /home/dse/dse-3.0 then the path is file:///home/dse/dse-3.0/resources/dse/lib/hive-jdbc-handler-0.8.1-wso2v6.jar.
Start the DSE Cassandra and Jobtracker using the following command:./dse cassandra -j -t>. You can view if the job tracker is started successfully using command: ./dsetool jobtracker . It returns the hostname and port the JobTracker.
View Cassandra ring status using command: ./nodetool ring -h [dse-node-ip]
Now start the Hive Thrift server using command: ./dse hive --service hiveserver -p 30000. It starts the Hive Thrift server on port 30000.

Hive server logs are available in <DSE_HOME>/logs/hive/hive directory.

Integrating BAM with DataStax

Change hive.metastore.local property in <BAM_HOME>/repository/conf/advanced/hive-site.xml to false.

<property>
	<name>hive.metastore.local</name>
	<value>false</value>
	<description>controls whether to connect to remove metastore server or open a new metastore server in Hive Client JVM</description>
</property>

Add following property to hive-site.xml. Change the host and port values according to the address that the DSE Hive Thrift server is running on. You can provide multiple URIs. They are load balanced with failover according to round robin algorithm.
```
<property>
	<name>hive.metastore.uris</name>
	<value>jdbc:hive://host:port/default</value>
	<description>remote meta store URIs. use comma separated list for multiple servers</description>
</property>
```

Change <BAM_HOME>/repository/conf/etc/cassandra-component.xml file to point to the DataStax Cassandra cluster nodes as follows:

<Cassandra>
    <Cluster>
        <Name>Test Cluster</Name>
        <DefaultPort>9160</DefaultPort>
	 	<Nodes>cassanda-ip1:9160, cassandra-ip2:9160</Nodes>
        <AutoDiscovery disable="true" delay="1000"/>
    </Cluster>
</Cassandra>

In the default Service Statistics toolbox, change the relative path of the H2 database to the absolute path. This is done because the script runs externally to the BAM server and can have issues in finding the relative path. This step is not required for production database.

For example, if you use the default H2 database, change the database file path specified by mapred.jdbc.url in the Hive scrips to an absolute value as shown below:

CREATE EXTERNAL TABLE IF NOT EXISTS AppServerStatsPerMinute(host STRING, service_name STRING, operation_name STRING, total_request_count INT,total_response_count INT, total_fault_count INT,avg_response_time DOUBLE,min_response_time BIGINT,max_response_time BIGINT, year SMALLINT,month SMALLINT,day SMALLINT,hour SMALLINT,minute SMALLINT, time STRING) STORED BY 'org.wso2.carbon.hadoop.hive.jdbc.storage.JDBCStorageHandler' TBLPROPERTIES (
'mapred.jdbc.driver.class' = 'org.h2.Driver',
'mapred.jdbc.url' = 'jdbc:h2:${DSE_HOME}/repository/database/samples/BAM_STATS_DB;AUTO_SERVER=TRUE',
'mapred.jdbc.username' = 'wso2carbon','mapred.jdbc.password' = 'wso2carbon',
'hive.jdbc.update.on.duplicate' = 'true',
'hive.jdbc.primary.key.fields' = 'host,service_name,operation_name,year,month,day,hour,minute',
'hive.jdbc.table.create.query' = 'CREATE TABLE AS_STATS_SUMMARY_PER_MINUTE ( host VARCHAR(100) NOT NULL, service_name VARCHAR(150),operation_name VARCHAR(150), total_request_count INT,total_response_count INT, total_fault_count INT,avg_response_time DOUBLE,min_response_time BIGINT,max_response_time BIGINT, year SMALLINT, month SMALLINT, day SMALLINT, hour SMALLINT, minute SMALLINT, time VARCHAR(30))' );

Repeat step 4 above for all H2 tables in the script. If the specific Hive tables are already there in the system, you should drop the tables and remove existing metadata by executing DROP TABLE <EXISTING_TABLE_NAME> command first.

Note that this command does not drop the actual table in the database, but a virtual mapping table in Hive.
Change the WSO2BAM_DATASOURCE in <BAM_HOME>/repository/conf/datasource/master-datasources.xml file according to the changes you did in step 4 and 5 above. BAM dashboard uses this datasource to fetch summary information for presentation.
Start the BAM server. If you want to start BAM without starting the internal Cassandra, use command: wso2server.sh -Ddisable.cassandra.server.startup=true.
Run the service statistics sample to accumulate data to DataStax Cassandra cluster.
Deploy the modified service statistics toolbox and see how it submits queries to the remote Hive server. Also, statistics are displayed in BAM dashboard after the script has run.