Configuring Data Persistence
WSO2 DAS introduces the ability to have a pluggable Data Access Layer (DAL).
Analytics Record Store
The Analytics Record Store is the section that handles the storing of records that are received by WSO2 DAS in the form of events. This store contains raw data relating to events in a tabular form to be retrieved later.
The following record stores are configured in the <DAS_HOME>/repository/conf/analytics/analytics-config.xml
file by default.
Record Store Type | Default Name | Description |
---|---|---|
Primary Store | EVENT_STORE | This record store is used to store the persisted incoming events of WSO2 DAS. It contains raw data in a tabular structure which can be used later. |
Processed Data Store | PROCESSED_DATA_STORE | This record store is used to store summarized event data. |
Configuring a record store
The following is a sample configuration of a record store.
<analytics-record-store name="EVENT_STORE"> <implementation>org.wso2.carbon.analytics.datasource.rdbms.RDBMSAnalyticsRecordStore</implementation> <properties> <property name="datasource">WSO2_ANALYTICS_EVENT_STORE_DB</property> <property name="category">large_dataset_optimized</property> </properties> </analytics-record-store>
The following needs to be specified for each record store.
- Name: A unique name for the record store.
- Implementation: This specifies the implementation for the record store, this class should be implementing the interface "org.wso2.carbon.analytics.datasource.core.rs.AnalyticsRecordStore". For the record store to function, the provider for the datasource type mentioned in this implementation should be enabled in the
<DAS_HOME>/repository/conf/datasources/analytics-datasources.xml
file. Record store specific properties: The properties that are defined per record store are described in the table below.
Property Description datasource
The name of the datasource used to connect to the database used by the record store.
Once a record store is configured in the analytics-config.xml
file, you can select it as the record store for the required event streams. For more information, see Persisting Data for Interactive Analytics.
Analytics Indexing
By default, WSO2 DAS executes indexing operation when the server is started. The following system property can be used to disable the indexing operations if required.
- For Windows:
wso2server.bat -DdisableIndexing
- Fow Linux:
wso2server.sh -DdisableIndexin
g
This option allows you to create servers that are dedicated for specific operations such as event receiving, analytics, indexing, etc.
Configuring common parameters
Data purging parameters
Parameter | Description | Default Value |
---|---|---|
<purging-enable> | This parameter specifies whether the functionality to purge data from event tables is enabled or not. | false |
<cron-expression> | A regex expression to select the tables from which data should be purged. | 0 0 0 * * ? |
<purge-include-tables> | A list of event tables from which data should be purged can be defined as subelements of this element. | |
<data-retention-days> | The number of days for which the data should be retained in the event tables that were selected to have their data purged. All the data in these tables are cleared once the number of days that equal the value specified for this parameter have elapsed. | 365 |
Other Parameters
Parameter | Description | Default Value |
---|---|---|
<analytics-lucene-analyzer> | The implementation of the Analytics Lucene Analyzer is defined as a subelement of this parameter. e.g., <implementation>org.apache.lucene.analysis.standard.StandardAnalyzer</implementation> | |
<indexReplicationFactor> | The index data replication factor to be used in clustered mode. This tells how many replicas that should be kept when indexing operations are done. 0 means, no replication. It should be set to 1 or higher to have high availability. | 1 |
<shardCount> | The number of index shards the server should maintain per cluster. This fine tunes the scaling nature of the indexing cluster. This parameter can only be set once for the lifetime of the cluster, and cannot be changed later on. | 6 |
<shardIndexRecordBatchSize> | The amount of index data (in bytes) to be processed at a time by a shard index worker. Minimum value is 1000. | 20971520 |
| The interval in milliseconds during which a shard index processing worker thread will sleep during index processing operations. This parameter together with the | 1500 |
<indexWorkerCount> | The number of index workers to operate in the current node. This basically results in the number of execution threads created to do the indexing operations of the local shards. When this value is increased, the parallel I/O operations being done on the system grows larger. So a system which can handle parallel I/O operation could increase this, e.g. SSDs. |
The DAL configuration can be found at <DAS_HOME>/repository/conf/analytics/analytics-config.xml
. An example is shown below.
<analytics-dataservice-configuration> <!-- The name of the primary record store --> <primaryRecordStore>EVENT_STORE</primaryRecordStore> <!-- Analytics Record Store - properties related to record storage implementation --> <analytics-record-store name="EVENT_STORE"> <implementation>org.wso2.carbon.analytics.datasource.rdbms.RDBMSAnalyticsRecordStore</implementation> <properties> <property name="datasource">WSO2_ANALYTICS_EVENT_STORE_DB</property> <property name="category">large_dataset_optimized</property> </properties> </analytics-record-store> <analytics-record-store name = "PROCESSED_DATA_STORE"> <implementation>org.wso2.carbon.analytics.datasource.rdbms.RDBMSAnalyticsRecordStore</implementation> <properties> <property name="datasource">WSO2_ANALYTICS_PROCESSED_DATA_STORE_DB</property> <property name="category">large_dataset_optimized</property> </properties> </analytics-record-store> <!-- The data indexing analyzer implementation --> <analytics-lucene-analyzer> <implementation>org.apache.lucene.analysis.standard.StandardAnalyzer</implementation> </analytics-lucene-analyzer> <!-- The number of index data replicas the system should keep, for H/A, this should be at least 1, e.g. the value 0 means there aren't any copies of the data --> <indexReplicationFactor>1</indexReplicationFactor> <!-- The number of index shards, should be equal or higher to the number of indexing nodes that is going to be working, ideal count being 'number of indexing nodes * [CPU cores used for indexing per node]' --> <shardCount>6</shardCount> <!-- The amount of index data (in bytes) to be processed at a time by a shard index worker. Minimum value is 1000. --> <shardIndexRecordBatchSize>20971520</shardIndexRecordBatchSize> <!-- The interval in milliseconds, which a shard index processing worker thread will sleep during index processing operations. This setting along with the 'shardIndexRecordBatchSize' setting can be used to increase the final index batched data amount the indexer processes at a given time. Usually, higher the batch data amount, higher the throughput of the indexing operations, but will have a higher latency of record insertion to indexing. Minimum value of this is 10, and a maximum value is 60000 (1 minute). --> <shardIndexWorkerInterval>1500</shardIndexWorkerInterval> <!-- The number of index workers to operate in the current node. This basically results in the number of execution threads created to do the indexing operations of the local shards. When this value is increased, the parallel I/O operations being done on the system grows larger. So a system which can handle parallel I/O operation could increase this, e.g. SSDs. --> <indexWorkerCount>1</indexWorkerCount> <!-- Data purging related configuration --> <analytics-data-purging> <!-- Below entry will indicate purging is enable or not. If user wants to enable data purging for cluster then this property need to be enable in all nodes --> <purging-enable>false</purging-enable> <cron-expression>0 0 0 * * ?</cron-expression> <!-- Tables that need include to purging. Use regex expression to specify the table name that need include to purging.--> <purge-include-tables> <table>.*</table> <!--<table>.*jmx.*</table>--> </purge-include-tables> <!-- All records that insert before the specified retention time will be eligible to purge --> <data-retention-days>365</data-retention-days> </analytics-data-purging> </analytics-dataservice-configuration>
Removing persisted data
The following two methods can be used to clear all the event data (both processed and unprocessed events) from the database.
- Run the Analytics Data Backup / Restore Tool with the
-deleteTables
argument, specifying the list of event tables for which you want to clear data. For more information, see Analytics Data Backup / Restore Tool. Clean your database, and remove the contents in the
<DAS_HOME>/repository/data
directory at the same time.Cleaning a database while retaining data in the
<DAS_HOME>/repository/data
directory does not result in permanently removing the persisted data.
If you want to clear only index data, see Configuring Indexes - Removing index data.