This section introduces you to Apache Hive query language (HQL) and how to set up databases and write Hive scripts to process and analyze the data stored in RDBMS or noSQL data bases. All Hive-related configurations in BAM are included in the following files.
...
Example 1:This is a basic example query to create a virtual Hive table by the name ActivityDataTable corresponding to a physical Cassandra column family.
Code Block | ||
---|---|---|
| ||
CREATE EXTERNAL TABLE IF NOT EXISTS ActivityDataTable (messageID STRING, sentTimestamp BIGINT, activityID STRING, version STRING, soapHeader STRING, soapBody STRING, host STRING) STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler' WITH SERDEPROPERTIES ( "cassandra.host"wso2.carbon.datasource.name" = "127.0.0.1" , WSO2BAM_CASSANDRA_DATASOURCE", "cassandra.portpartitioner" = "9160" , "cassandra.ks.name" = "EVENT_KS" , "cassandra.ks.username" = "admin" , "cassandra.ks.password" = "admin" , org.apache.cassandra.dht.RandomPartitioner", "cassandra.cf.name" = "org_wso2_bam_activity_monitoring" , "cassandra.columns.mapping" = ":key, payload_timestamp, correlation_bam_activity_id, Version, payload_SOAPHeader, payload_SOAPBody, meta_host" ); |
Note the following regarding the above query:
- The new handler org.apache.hadoop.hive.cassandra.CassandraStorageHandler is used instead of the JDBC handler class.
- WITH SERDEPROPERTIES is used instead of TBLPROPERTIES command.
- The Cassandra storage handler class takes the following parameters for its SerDe properties.
- cassandra.host : Host names/IP Addresses of Cassandra nodes. You can use a comma-separated list for multiple nodes in the Cassandra ring for fail-over.
- cassandra.port : The port through which Cassandra listens to client requests.
- cassandra.ks.name : Cassandra Keyspace name. Keyspaces are logically similar to databases in RDBMS. The connection parameters, host, port, username and password are declared explicitly in cassandra.host, cassandra.port, cassandra.ks.username and cassandra.ks.password respectively. The name of the keyspace is EVENT_KS by default. To change this, edit the <keySpaceaName> element in <BAM_HOME>/repository/conf/ data-bridges/data-bridge- config.xml file.
- cassandra.ks.username : Username (username@tenant_domain if in Stratos) for authenticating Cassandra Keyspace. If no authentication is required to the Keyspace to be connected, you can skip this.
- cassandra.ks.password : Password for authenticating the Cassandra Keyspace. If no authentication is required to the Keyspace to be connected, you can skip this.
- cassandra.cf.name : Cassandra ColumnFamily name. In this example, org_wso2_bam_activity_monitoring is set as the column family name.
- cassandra.columns.mapping : Used to map the Cassandra column family keys to the Hive table fields. Should be in the same order as the Hive field definitions in CREATE TABLE. So the Hive table fields messageID, sentTimestamp, activityID, version, soapHeader, soapBody and host are mapped to the column family keys (keys of key-value pairs) by the names :key, payload_timestamp, correlation_bam_activity_id, Version, payload_SOAPHeader, payload_SOAPBody and meta_host. The reason is because the column family is already created, and the Hive script only creates the mapped Hive table onto the existing column family. :key is the unique row key available for each row in the Cassandra column family. You should map this field with a Hive table field in every Hive script.
...
Info | |||||
---|---|---|---|---|---|
If you are using a Cassandra version below 1.1.3, you need to add the following element in Hive queries you write:
|
...
- 'mapred.jdbc.driver.class' = 'org.h2.Driver' : This is the JDBC driver class name required for the database connection. You should add the JDBC driver JAR file to <BAM_HOME>/repository/components/lib directory. The H2 JDBC driver comes by default with WSO2 BAM. DBC driver class name.
- 'mapred.jdbc.url' = 'jdbc:h2:repository/database/samples/WSO2CARBON_DB;AUTO_SERVER=TRUE' : The JDBC connection URL to the database.
- 'mapred.jdbc.username' = 'wso2carbon' : JDBC Username.
- 'mapred.jdbc.password' = 'wso2carbon' : JDBC Password.
- hive.jdbc.update.on.duplicate : Updates the record in the database on existing records with current records primary key if true.
- hive.jdbc.primary.key.fields : Specifies the primary keys of database table.
- hive.jdbc.table.create.query : You can provide here a create table query with database-specific syntax. Useful when the database has specific syntax for creating tables.
...