Storing Index Data

Index data in WSO2 DAS is stored in a local file system. All index data are partitioned into units known as shards. These shards can be viewed in the <DAS_HOME>/repository/data/index_data directory where there is a sub directory for each available shard.

Configuring shards

Shards that exist in the local file system can be managed by configuring the following parameters in the <DAS_HOME>/repository/conf/analytics/analytics-config.xml file.

Parameter	Description	Default
`indexReplicationFactor`	The number of index data replicas that should be saved in the system. In a high availability deployment, this at least one replica should be saved.	1
`shardCount`	The number of index shards that are allowed to exist in the local file system at a given time. The number specified should be higher than the number of indexing nodes in the DAS cluster. The ideal number can be calculated as follows. `number of indexing nodes * [CPU cores used for indexing per node]`	6
`shardIndexRecordBatchSize`	The amount of index data to be processed by a shard index worker at a given time. This is expressed in bytes. The minimum amount should be `1000`.	20971520
`shardIndexWorkerInterval`	The time interval during which a shard index processing worker can be inactive while processing operations are taking place, expressed in milliseconds. This parameter, together with the `shardIndexRecordBatchSize` parameter can be used to increase the final index batched data amount the an index worker processes at a given time. A higher batch data amount usually results in a higher throughput. However, it can also increase the latency of record insertion to indexing. The minimum value is `10`, and the maximum value is `60000` (1 minute).	1500

Allocating shards in a clustered deployment

In a WSO2 DAS cluster, the available shards are equally distributed among all the indexing nodes (i.e. nodes for which indexing is enabled). e.g., if the cluster has 3 indexing nodes and 6 shards, each indexing node is assigned two shards (i.e unless replication is enabled). When a new indexing node joins a cluster, the existing shard allocations change in order to assign some of the shards to the new node.

If you do not want a new node to operate as an indexing node, you should disable indexing at the time the node is started, using the following setting.

disableIndexing=true

If you want to stop an existing node operating as an indexing node, you should restart it with the same setting. As a result, the existing shard allocation in the indexing cluster changes in order to reallocate the shards of the quitting node to other indexing nodes.

Mistakenly started indexing nodes

If you start a node as an indexing node by mistake, it changes global configurations and these changes need to be reverted manually. If the replication factor is equal to or greater than 1, you can still query and get the required data even if this node is inactive by following the procedure below.

Restart the node as a non-indexing node (i.e. by setting the disableIndexing=true property at the time the node is restarted).
If you want to clear the index data stored in the node, delete them from <DAS_HOME>/repository/data/indexing_data directory.
If you want to use the node in another server profile, restart the node in the required profile.

When you restart an indexing node as a non-indexing node, you should also restart the other indexing servers for them to get the indexing updates of the node that stopped operating as an indexing node.
If you start an indexing server by mistake, it changes the global configurations. You need to make sure that the shard allocations are correct before proceeding.

When an indexing node is restarted as a non indexing node, the indexing data stored in it is not automatically removed. You can remove it if required from the <DAS-HOME>/repository/data/indexing_data directory.

Allocating shards manually

Shards can be configured manually in the <DAS_HOME>/repository/conf/analytics/local-shard-allocation-config.conf file.

There are three modes when configuring the local shard allocations of a node.

Mode Description

NORMAL The indexing data for a shard is stored in the node to which the shard is assigned.

INIT

If you restart the server after adding a shard in the INIT mode, that shard would be re-indexed in that node.

e.g., If the existing shard allocations are as follows, and you add the line 4, INIT and restart the server in order to reindex the data for shard 4. After the data is reindexed, the mode is changed to NORMAL.

1, NORMAL
2, NORMAL

RESTORE

This mode allows you to copy index data to a local node in order to let that node use it.

e.g., If you copy index data for shard 5, add the line 5, RESTORE to the following shard allocation, and then restart the server, the node allocates the 5th shard to that node (which is then used to search and index).

1, NORMAL
2, NORMAL

WSO2 Data Analytics Server Documentation

Storing Index Data

Configuring shards

Allocating shards in a clustered deployment

Allocating shards manually

Related Links