JDBC Storage Handler for Hive

JDBC Storage Handler for Hive is a storage handling feature of WSO2 BAM. This page describes the usage of Hive JDBC Storage Handler with its basic functionality. Currently It supports writing into any database and reading from major databases (MySQL, Microsoft SQL, Oracle, H2, PostgreSQL).

Setting up BAM to use Hive JDBC handler

Obtain the jdbc-driver JAR file that is shipped with the respective database type you use.
Add the jdbc-driver to <BAM_HOME>/repository/component/lib/ directory, before starting the server.

Sample

Below is a sample Hive query, which uses the JDBC Storage Handler to call and connect to the specified database, and store the query results in it.

CREATE EXTERNAL TABLE IF NOT EXISTS PhonebrandTable(brand STRING, 
totalOrders INT, totalQuantity INT) STORED BY 
'org.wso2.carbon.hadoop.hive.jdbc.storage.JDBCStorageHandler' 
TBLPROPERTIES ( 
'wso2.carbon.datasource.name'='WSO2BAM_DATASOURCE',
'hive.jdbc.update.on.duplicate' = 'true', 
'hive.jdbc.primary.key.fields' = 'brand', 
'hive.jdbc.table.create.query' = 
'CREATE TABLE brandSummary (brand VARCHAR(100) NOT NULL PRIMARY KEY, 
totalOrders INT, totalQuantity INT)');
insert overwrite table PhonebrandTable

Storage handler table properties

Following are the properties that can be defined under TBLPROPERTIES in a Hive query, which uses the Hive JDBC Storage Handler:

Property name	Required	Details
`mapred.jdbc.driver.class`	Yes	The classname for the JDBC Driver to use. This should be available on Hive's classpath.
`mapred.jdbc.url`	Yes	The connection URL for the database.
`hive.jdbc.primary.key.fields`	Yes	Whetehr you have any primary keys in the database table.
`mapred.jdbc.username`	No	The username to access the database, if it's required.
mapred.jdbc.password	No	The password to access the database, if it's required.
`hive.jdbc.table.create.query`	No	If the table already exists in the database, then you don't need this. Else, provide the SQL query for creating the table in the database.
`mapred.jdbc.output.table.name`	No	The name of the table in the database. It need not be the same as the name of the table in Hive. If you specified the SQL query for creating the table, handler will pick the table name from the query. Otherwise, if your meta table name is different from the table in database, specify this.
`hive.jdbc.update.on.duplicate`	No	Enter either true or false as the value. If true is entered, then the storage handler will update the records with duplicate keys. Otherwise it will insert all the data.
`hive.jdbc.output.upsert.query`	No	This is used to optimize the update operation. The default implementation is to use either insert or update statements after the select statement. Therefore, there will be two database round trips. But we can reduce it to one, by using database specific upsert statements. For Example a query for a MySQL database is: `'INSERT INTO productSummary (product, itemsSold) values (?,?) ON DUPLICATE KEY UPDATE itemsSold=?'`
`hive.jdbc.upsert.query.values.order`	No	If you are using an upsert query, then this is mandatory. Sample values for the above query will be: `'product,itemsSold,itemsSold' //values order for each question mark`
`hive.jdbc.input.columns.mapping`	No	This is mandatory if your field names in the meta table and database tables are different. P rovide the field names in the database table in the same order as the field names in meta table with ',' separated values. For example: `productNames,noOfItemsSold` These will map to your meta table with `product,itemsSold` field names.
`mapred.jdbc.input.table.name`	No	Used when reading from a database table. This is needed if the meta table name and database table name are different.