Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This section introduces you to Apache Hive query language (HQL) and how to set up databases and write Hive scripts to process and analyze the data stored in RDBMS or noSQL data bases. All Hive-related configurations in BAM are included in the following files.

...

  • The new handler org.apache.hadoop.hive.cassandra.CassandraStorageHandler is used instead of the JDBC handler class.
  • WITH SERDEPROPERTIES is used instead of TBLPROPERTIES command.
  • The Cassandra storage handler class takes the following parameters for its SerDe properties.
    • cassandra.host : Host names/IP Addresses of Cassandra nodes. You can use a comma-separated list for multiple nodes in the Cassandra ring for fail-over.
    • cassandra.port : The port through which Cassandra listens to client requests.
    • cassandra.ks.name : Cassandra Keyspace name. Keyspaces are logically similar to databases in RDBMS. The connection parameters, host, port, username and password are declared explicitly in cassandra.host, cassandra.port, cassandra.ks.username and cassandra.ks.password respectively. The name of the keyspace is EVENT_KS by default. To change this, edit the <keySpaceaName> element in <BAM_HOME>/repository/conf/ data-bridges/data-bridge- config.xml file.  
    • cassandra.ks.username : Username (username@tenant_domain if in Stratos) for authenticating Cassandra Keyspace. If no authentication is required to the Keyspace to be connected, you can skip this.
    • cassandra.ks.password : Password for authenticating the Cassandra Keyspace. If no authentication is required to the Keyspace to be connected, you can skip this.
    • cassandra.cf.name : Cassandra ColumnFamily name. In this example, org_wso2_bam_activity_monitoring is set as the column family name.
    • cassandra.columns.mapping : Used to map the Cassandra column family keys to the Hive table fields. Should be in the same order as the Hive field definitions in CREATE TABLE. So the Hive table fields messageID, sentTimestamp, activityID, version, soapHeader, soapBody and host are mapped to the column family keys (keys of key-value pairs) by the names :key, payload_timestamp, correlation_bam_activity_id, Version, payload_SOAPHeader, payload_SOAPBody and meta_host. The reason is because the column family is already created, and the Hive script only creates the mapped Hive table onto the existing column family. :key is the unique row key available for each row in the Cassandra column family. You should map this field with a Hive table field in every Hive script.

...

Datasource names can also be used for Cassandra storage hander in Hive scripts. Predefined datasource named "WSO2BAM_CASSANDRA_DATASOURCE" is already there for this purpose. Modifications for "WSO2BAM_CASSANDRA_DATASOURCE" can be done by editing <CARBON_SERVER>/repository/conf/datasources/masterbam-datasources.xml. Given below is the datasource configuration for Cassandra.

...

Method 1 : Define the data source in masterbam-datasources.xml file

Here is an example query for creating a virtual Hive table by the name ActivitySummaryTable corresponding to a physical H2 table by the name ActivitySummary. Note that you do not have to match the column names in the Hive table to those in H2. The query maps each column in Hive with a column in H2 table based on the order it is defined. You can find the below query in BAM Activity Monitoring Toolbox as well.

...