Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

DAS indexer is implemented using Apache lucene Lucene which is a full text search library. Users can index records and then later can search for records using lucene later via Lucene queries. Events received by DAS are converted to a list of “Record” objects  records and then they are inserted to into FileSystem Based Queues. (This queue is created in {DAS_HOME} based queues. These queues are created in <DAS_HOME>/repository/data/index_staging_queues) directory. With a background thread, these queues are consumed and records are indexed. (Indexed The indexed data is stored in {DAS_HOME}the <DAS_HOME>/repository/data/index_data directory. The DAS index consists of smaller indexes called known as shards. At a time a A shard can be accessed nly by only one Index index writer at a given time. So Therefore having multiple shards can increase the write throughput (Can however, the write throughput can be limited by Disk IO operations). By default, DAS is configured to have 6 shards and 1 replica (number of replicas come into play if DAS is clusteredreplica.

Info

In a high availability deployment, at least one replica must be saved.


Indexing related configurations is in are done in the <DAS_HOME>/repository/conf/analytics/analytics-config.xml in {DAS_HOME}file. Additionally, information relating to shards is maintained in the <DAS_HOME>/repository/conf/analytics folder. Additionally, shards information is kept in a file called “local/local-shard-allocation-config.conf in the same location file. This file stores the shard number along with its state (INIT, NORMALthat can be INIT or NORMAL). The INIT is the initial state. Usually this state cannot be seen from outside, that . This is because as soon as the server starts, the INIT state changes to the NORMAL state once the server starts. If the indexing node is running, the state of shards cannot be INIT. It should be NORMAL and not INIT. The NORMAL state denotes that the indexing node has started indexing. So Therefore, whenever the data is ready to be indexed, the indexer node will index indexes the incoming data.

To re-index the whole dataset he/she should follow below steps.

...

, follow the steps below:

  1. Shut down the WSO2 DAS server.
  2. Remove all the index data stored in the <DAS_HOME>/repository/dataChange the NORMAL state to INIT by editing the “local directory.
  3. In the <DAS_HOME>/repository/conf/analytics/local-shard-allocation-config.confStart the Analytics Server file, change the mode for all the shards from NORMAL to INIT.
  4. Restart the WSO2 DAS server.

By default, DAS is configured to have 6 shards and 1 replica for each shard(shards and replicas, altogether 12 shards). So in Min-HA setup (two node cluster), each DAS server will contain 3 shards and replicas of the other 3 shards. This is communicated between the nodes through Hazelcast messages. Even if one server goes down, second server will have the replicas of the 3 shards which were in the node that went down, so HA is preserved. For example,  If there are 3 DAS nodes in the cluster, all 12 shards (shards + replicas) will be split among the 3 nodes where each node will have 4 shards (2 shards +  any 2 replicas from other 4 shards). 

...