Creating the Table Using Carbon JDBC as the Provider
The CarbonJDBC provider can be used to interact directly with a relational database, bypassing the DAS Data Access Layer. As a result, the data is stored in the format native to the relational storage mechanism in use without compression or encoding. This is useful in scenarios where the results of a Spark query need to be stored in a relational database to be accessed by third parties such as reporting tools or dashboards, or if the results need to be readable outside WSO2 DAS.
Multi-tenancy with the CarbonJDBC provider
To use the CarbonJDBC analytics provider for tenant scenarios, you will need to specifically create the relevant data source that you intend to use for that tenant. Please see the page on data sources for instructions on creating a data source.
The following syntax is used to create a temporary table using the CarbonJDBC analytics provider in Apache Spark with a relational table backing it.
CREATE TEMPORARY TABLE <temp_table> using CarbonJDBC options (dataSource "<datasource name>", tableName "<table name>", schema "<schema>" [, primaryKeys "<primaryKeys>"]);
Options in the above syntax are described below.
Option | Condition | Description | Example |
---|---|---|---|
dataSource "<datasource name>" | Mandatory | The name of the data source from which data should be obtained for the temporary table. Only RDBMS data sources are supported. | dataSource "test" |
tableName "<table name>" | Mandatory | The name of the table in the selected data source that should be used when creating the temporary table. This would be the table from which data will be obtained on SELECT operations, and to which data will be added on INSERT operations. If the table does not already exist in the selected data source, it will be created automatically based on the given schema together with the relevant type mappings for the RDBMS instance in use. | tableName "TEST.PEOPLE" |
schema "<schema>" | Mandatory | The schema used to represent the relational table within Apache Spark. The data types allowed for this parameter are as follows.
The The specification of the -i key following a particular column definition defines a relational index on the target datasource. If multiple parameters are specified, all of them are grouped to create a single index. | schema "student_id STRING -i, name STRING(100), marks INTEGER" |
| Optional | This parameter can be used to specify unique keys that are either in use or required for the table. | primaryKeys "student_id, name" |
The field mappings for each supported database type are maintained in the <DAS_HOME>/repository/conf/analytics/spark/ spark-jdbc-config.xml
file.
Sample Queries
The following are some sample queries using CarbonJDBC
as the provider.
CREATE TEMPORARY TABLE StateUsage using CarbonJDBC OPTIONS (dataSource "MY_DATASOURCE", tableName "state_usage",schema "us_state STRING -i, polarity INTEGER, usage_avg FLOAT", primaryKeys "us_state"); INSERT INTO TABLE StateUsage SELECT state, polarity, state_avg_usage FROM USCensusData; INSERT OVERWRITE TABLE StateUsage SELECT state, polarity, state_avg_usage FROM USCensusData; SELECT * FROM StateUsage LIMIT 5;