...
DAS indexer is implemented using Apache Lucene which is a full text search library. Users can index records and search for records later via Lucene queries. Events received by DAS are converted to a list of records and inserted into FileSystem
based queues. These queues are created in <DAS_HOME>/repository/data/index_staging_queues
directory. With a background thread, these queues are consumed and records are indexed. The indexed data is stored in the <DAS_HOME>/repository/data/index_data
directory. The DAS index consists of smaller indexes known as shards. A shard can be accessed nly by one index writer at a given time. Therefore having multiple shards can increase the write throughput (however, the write throughput can be limited by Disk IO operations). By default, DAS is configured to have 6 six shards and 1 one replica.
Info |
---|
In a high availability deployment, at least one replica must be saved. |
...
- Shut down the WSO2 DAS server.
- Remove all the index data stored in the
<DAS_HOME>/repository/data
directory. - In the
<DAS_HOME>/repository/conf/analytics/local-shard-allocation-config.conf
file, change the mode for all the shards fromNORMAL
toINIT
. - Restart the WSO2 DAS server.
By default, a single DAS server is configured to have 6 six shards, and 1 one replica for each shard (shards and replicas, altogether 12 i.e., a total of twelve shards). So in Min-Therefore, in a minimum HA setup (two node cluster), each DAS server will contain 3 contains three shards and replicas of the other 3 three shards. This is communicated between the nodes through Hazelcast via Hazlecast messages. Even if one server goes is shut down, the second server will have the has replicas of the 3 three shards which that were there in the node that went down, so HA is preserved. For example, If there are 3 was shut down. This allows the high availability to be maintained.
The following example further explains how shards are managed in a clustered deployment.
If there are three DAS nodes in the cluster, all 12 twelve shards (shards + i.e., the shards and the replicas) will be are split among the 3 three nodes where each . Each node will have 4 shards (four shrds which consists of 2 shards + any and 2 replicas from other 4 shards).
Debugging Indexing in a cluster
...