Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
languagexml
titlesample analytics-config.xml
<analytics-dataservice-configuration>
   <!-- The name of the primary record store -->
   <primaryRecordStore>EVENT_STORE</primaryRecordStore>
   <!-- Analytics Record Store - properties related to record storage implementation -->
   <analytics-record-store name="EVENT_STORE">
      <implementation>org.wso2.carbon.analytics.datasource.rdbms.RDBMSAnalyticsRecordStore</implementation>
      <properties>
            <property name="datasource">WSO2_ANALYTICS_EVENT_STORE_DB</property>
            <property name="category">large_dataset_optimized</property>
      </properties>
   </analytics-record-store>
   <analytics-record-store name = "PROCESSED_DATA_STORE">
      <implementation>org.wso2.carbon.analytics.datasource.rdbms.RDBMSAnalyticsRecordStore</implementation>
      <properties>
            <property name="datasource">WSO2_ANALYTICS_PROCESSED_DATA_STORE_DB</property>
            <property name="category">large_dataset_optimized</property>
      </properties>
   </analytics-record-store>
   <!-- The data indexing analyzer implementation -->
   <analytics-lucene-analyzer>
       <implementation>org.apache.lucene.analysis.standard.StandardAnalyzer</implementation>
   </analytics-lucene-analyzer>
   <!-- The number of index data replicas the system should keep, for H/A, this should be at least 1, e.g. the value 0 means
        there aren't any copies of the data -->
   <indexReplicationFactor>1</indexReplicationFactor>
   <!-- The number of index shards, should be equal or higher to the number of indexing nodes that is going to be working,
        ideal count being 'number of indexing nodes * [CPU cores used for indexing per node]' -->
   <shardCount>6</shardCount>
   <!-- The number of batch index records, the indexing node will process per each indexing thread. A batch index record basically
        encapsulates a batch of records retrieved from the receiver to be indexed -->
   <shardIndexRecordBatchSize>100</shardIndexRecordBatchSize>
   <!-- The interval in milliseconds, which a shard index processing worker thread will sleep during index processing operations. This setting
        along with the 'shardIndexRecordBatchSize' setting can be used to increase the final index batched data amount the indexer processes
        at a given time. Usually, higher the batch data amount, higher the throughput of the indexing operations, but will have a higher latency
        of record insertion to indexing. Minimum value of this is 10, and a maximum value is 60000 (1 minute). -->
   <shardIndexWorkerInterval>1500</shardIndexWorkerInterval>
   <!-- Data purging related configuration -->
   <analytics-data-purging>
      <!-- Below entry will indicate purging is enable or not. If user wants to enable data purging for cluster then this property
       need to be enable in all nodes -->
      <purging-enable>false</purging-enable>
      <cron-expression>0 0 0 * * ?</cron-expression>
      <!-- Tables that need include to purging. Use regex expression to specify the table name that need include to purging.-->
      <purge-include-tables>
         <table>.*</table>
         <!--<table>.*jmx.*</table>-->
      </purge-include-tables>
      <!-- All records that insert before the specified retention time will be eligible to purge -->
      <data-retention-days>365</data-retention-days>
   </analytics-data-purging>
   <!-- Receiver/Indexing flow-control configuration -->
   <analytics-receiver-indexing-flow-control enabled = "true">
       <!-- maximum number of records that can be in index staging area before receiving is throttled -->
       <recordReceivingHighThreshold>10000</recordReceivingHighThreshold>
       <!-- the limit on number of records to be lower than, to reduce throttling -->
       <recordReceivingLowThreshold>5000</recordReceivingLowThreshold>    
   </analytics-receiver-indexing-flow-control>
</analytics-dataservice-configuration>  

Analytics Record Store
Anchor
Record Store
Record Store

...

ParameterDescriptionDefault Value
<analytics-lucene-analyzer>

The implementation of the Analytics Lucene Analyzer is defined as a subelement of this parameter.

e.g.,  <implementation>org.apache.lucene.analysis.standard.StandardAnalyzer</implementation>

 
<indexReplicationFactor>The index data replication factor to be used in clustered mode. This tells how many replicas that should be kept when indexing operations are done. 0 means, no replication. It should be set to 1 or higher to have high availability.1
<shardCount>

The number of index shards the server should maintain per cluster. This fine tunes the scaling nature of the indexing cluster.

Note

This parameter can only be set once for the lifetime of the cluster, and cannot be changed later on.


6
<shardIndexRecordBatchSize>

The number of batch index records the indexing node should process per each indexing thread at a given time.

An index record contains data of a record batch inserted in a single put operation. This batch can be as high as the event receiver queue data size, which is 10MB by default. Therefore, the highest amount of in-memory record data that an indexing processing thread can have is 10MB * 100. This parameter should be configured to change the maximum amount of memory available to the indexing node based on your requirement.

100
<shardIndexWorkerInterval>

The interval in milliseconds, which a shard index processing worker thread will sleep during index processing operations. This setting along with the 'shardIndexRecordBatchSize' setting can be used to increase the final index batched data amount the indexer processes at a given time. Usually, higher the batch data amount, higher the throughput of the indexing operations, but will have a higher latency of record insertion to indexing. Minimum value of this is 10, and a maximum value is 60000 (1 minute).

1500