Configuring a Fully Distributed Cluster

The following diagram indicates the fully-distributed deployment pattern used for high availability.

Distributed component	Minimum number of nodes	Description
Receiver nodes	2	For data analytics to happen, it is necessary to first collect the relevant data you require. DAS provides data agents that capture information on the messages flowing through the WSO2 ESB, WSO2 Application Server, and other products that use the DAS data publisher. The information is obtained by the data receivers and are then stored in a datastore, where it is optimized for analysis. The receiver nodes are used to obtain this data from the data agents.
Indexer nodes	2	A background indexing process fetches the data from the datastore and does the indexing operations. These operations are handled by the indexer nodes in a fully distributed, highly available system.
Analyzer (Spark) nodes	2	The analyzer engine, which is powered by Apache Spark, analyzes this data according to defined analytic queries. This will usually follow a pattern of retrieving data from the datastore, performing a data operation such as an addition, and storing the data back in the datastore. The analyzer operations are performed by the analyzer nodes.
Dashboard nodes	2	The dashboard sends queries to the datastore for the analyzed data and displays them graphically. This function can be distributed to the dashboard nodes.
Storm nodes	0	Apache Storm can be used to handle any additional load. This can be any number of nodes and need not be used in a fully distributed system unless required.

Prerequisites

Download and install WSO2 DAS from here.
Make sure that you have allocated the required memory for DAS nodes, and installed the required supporting applications as mentioned in WSO2 DAS Documentation - Installation Prerequisites.

In WSO2 DAS clustered deployments, Spark is run in a seperate JVM. It is recommended to allocate 4GB of memory for Carbon JVM and 2GB for Spark.
Follow the steps below to set up MySQL.
1. Download and install MySQL Server.
2. Download the MySQL JDBC driver.
3. Unzip the downloaded MySQL driver zipped archive, and copy the MySQL JDBC driver JAR (mysql-connector-java-x.x.xx-bin.jar) into the <DAS_HOME>/repository/components/lib directory of all the nodes in the cluster.
4. Enter the following command in a terminal/command window, where username is the username you want to use to access the databases.
  mysql -u username -p
5. When prompted, specify the password that will be used to access the databases with the username you specified.
6. Create two databases named userdb and regdb.
  About using MySQL in different operating systems
  For users of Microsoft Windows, when creating the database in MySQL, it is important to specify the character set as latin1. Failure to do this may result in an error (error code: 1709) when starting your cluster. This error occurs in certain versions of MySQL (5.6.x) and is related to the UTF-8 encoding. MySQL originally used the latin1 character set by default, which stored characters in a 2-byte sequence. However, in recent versions, MySQL defaults to UTF-8 to be friendlier to international users. Hence, you must use latin1 as the character set as indicated below in the database creation commands to avoid this problem. Note that this may result in issues with non-latin characters (like Hebrew, Japanese, etc.). The following is how your database creation command should look.
```
mysql> create database <DATABASE_NAME> character set latin1;
```
  For users of other operating systems, the standard database creation commands will suffice. For these operating systems, the following is how your database creation command should look.
```
mysql> create database <DATABASE_NAME>;
```
7. Execute the following script for the two databases you created in the previous step.
  mysql> source <DAS_HOME>/dbscripts/mysql.sql;
  Click here to view the commands for performing steps f and g
  mysql> create database userdb; mysql> use userdb; mysql> source <DAS_HOME>/dbscripts/mysql.sql; mysql> grant all on userdb.* TO username@localhost identified by "password"; mysql> create database regdb; mysql> use regdb; mysql> source <DAS_HOME>/dbscripts/mysql.sql; mysql> grant all on regdb.* TO username@localhost identified by "password";
8. Create the following databases in MySQL.
  - WSO2_ANALYTICS_EVENT_STORE_DB
  - WSO2_ANALYTICS_PROCESSED_DATA_STORE_DB
  It is recommended to create the databases with the same names given above because they are the default JNDI names that are included in the <DAS_HOME>/repository/conf/analytics/analytics-conf.xml file as shown in the extract below. If you change the name, the analytics-conf.xml file should be updated with the changed name.
  <analytics-record-store name="EVENT_STORE"> <implementation>org.wso2.carbon.analytics.datasource.rdbms.RDBMSAnalyticsRecordStore</implementation> <properties> <property name="datasource">WSO2_ANALYTICS_EVENT_STORE_DB</property> <property name="category">read_write_optimized</property> </properties> </analytics-record-store> <analytics-record-store name="EVENT_STORE_WO"> <implementation>org.wso2.carbon.analytics.datasource.rdbms.RDBMSAnalyticsRecordStore</implementation> <properties> <property name="datasource">WSO2_ANALYTICS_EVENT_STORE_DB</property> <property name="category">write_optimized</property> </properties> </analytics-record-store> <analytics-record-store name="PROCESSED_DATA_STORE"> <implementation>org.wso2.carbon.analytics.datasource.rdbms.RDBMSAnalyticsRecordStore</implementation> <properties> <property name="datasource">WSO2_ANALYTICS_PROCESSED_DATA_STORE_DB</property> <property name="category">read_write_optimized</property> </properties> </analytics-record-store>

Required configurations

When configuring the Fully distributed cluster following setups should be done in each DAS node.

Follow the steps below to point the user stores of all the nodes to a single user store database, and to mount all governance registries to a single registry and configuration registries to a single configuration registry.

Follow the steps below to configure the <DAS_HOME>/repository/conf/datasources/master-datasources.xml file as required

Enable the all the nodes to access the users database by configuring a datasource to be used by user manager as shown below.

<datasource>    
<name>WSO2UM_DB</name>
    <description>The datasource used by user manager</description>
    <jndiConfig>
    <name>jdbc/WSO2UM_DB</name>
    </jndiConfig>
    <definition type="RDBMS">
    <configuration>
        <url>jdbc:mysql://[MySQL DB url]:[port]/userdb</url>
        <username>[user]</username>
        <password>[password]</password>
        <driverClassName>com.mysql.jdbc.Driver</driverClassName>
        <maxActive>50</maxActive>
        <maxWait>60000</maxWait>
        <testOnBorrow>true</testOnBorrow>
        <validationQuery>SELECT 1</validationQuery>
        <validationInterval>30000</validationInterval>
    </configuration>
    </definition>
</datasource>

Enable the nodes to access the registry database by configuring the WSO2REG_DB data source as follows.

<datasource>
    <name>WSO2REG_DB</name>
    <description>The datasource used by the registry</description>
    <jndiConfig>
    <name>jdbc/WSO2REG_DB</name>
    </jndiConfig>
    <definition type="RDBMS">
    <configuration>
        <url>jdbc:mysql://[MySQL DB url]:[port]/regdb</url>
        <username>[user]</username>
        <password>[password]</password>
        <driverClassName>com.mysql.jdbc.Driver</driverClassName>
        <maxActive>50</maxActive>
        <maxWait>60000</maxWait>
        <testOnBorrow>true</testOnBorrow>
        <validationQuery>SELECT 1</validationQuery>
        <validationInterval>30000</validationInterval>
    </configuration>
    </definition>
</datasource>

For detailed information about registry sharing strategies, see the library article Sharing Registry Space across Multiple Product Instances.

Point to WSO2_ANALYTICS_EVENT_STORE_DB and WSO2_ANALYTICS_PROCESSED_DATA_STORE_DB in the <DAS_HOME>/repository/conf/datasources/analytics-datasources.xml file as shown below.

<datasources-configuration>
    <providers>
        <provider>org.wso2.carbon.ndatasource.rdbms.RDBMSDataSourceReader</provider>
    </providers>
 
    <datasources>
        <datasource>
            <name>WSO2_ANALYTICS_EVENT_STORE_DB</name>
            <description>The datasource used for analytics record store</description>
            <definition type="RDBMS">
                <configuration>
                    <url>jdbc:mysql://[MySQL DB url]:[port]/WSO2_ANALYTICS_EVENT_STORE_DB</url>
                    <username>[username]</username>
                    <password>[password]</password>
                    <driverClassName>com.mysql.jdbc.Driver</driverClassName>
                    <maxActive>50</maxActive>
                    <maxWait>60000</maxWait>
                    <testOnBorrow>true</testOnBorrow>
                    <validationQuery>SELECT 1</validationQuery>
                    <validationInterval>30000</validationInterval>
                    <defaultAutoCommit>false</defaultAutoCommit>
                </configuration>
            </definition>
        </datasource>
        <datasource>
            <name>WSO2_ANALYTICS_PROCESSED_DATA_STORE_DB</name>
            <description>The datasource used for analytics record store</description>
            <definition type="RDBMS">
                <configuration>
                    <url>jdbc:mysql://[MySQL DB url]:[port]/WSO2_ANALYTICS_PROCESSED_DATA_STORE_DB</url>
                    <username>[username]</username>
                    <password>[password]</password>
                    <driverClassName>com.mysql.jdbc.Driver</driverClassName>
                    <maxActive>50</maxActive>
                    <maxWait>60000</maxWait>
                    <testOnBorrow>true</testOnBorrow>
                    <validationQuery>SELECT 1</validationQuery>
                    <validationInterval>30000</validationInterval>
                    <defaultAutoCommit>false</defaultAutoCommit>
                </configuration>
            </definition>
        </datasource>
    </datasources>
</datasources-configuration>

For more information, see Datasources in DAS documentation.

To share the user store among the nodes, open the <DAS_HOME>/repository/conf/user-mgt.xmlfile and modify the dataSource property of the <configuration> element as follows.
```
<configuration> ...
    <Property name="dataSource">jdbc/WSO2UM_DB</Property>
</configuration>
```
The datasource name specified in this configuration should be the same as the datasource used by user manager that you configured in sub step a, i.

In the <DAS_HOME>/repository/conf/registry.xml file, add or modify the dataSource attribute of the <dbConfig name="govregistry"> element as follows.

<dbConfig name="govregistry">
	<dataSource>jdbc/WSO2REG_DB</dataSource>
</dbConfig>
<remoteInstance url="https://localhost:9443/registry"> 
	<id>gov</id>
	<cacheId>user@jdbc:mysql://localhost:3306/regdb</cacheId>
	<dbConfig>govregistry</dbConfig>
	<readOnly>false</readOnly>
	<enableCache>true</enableCache>
	<registryRoot>/</registryRoot>
</remoteInstance>
<mount path="/_system/governance" overwrite="true">
	<instanceId>gov</instanceId>
	<targetPath>/_system/governance</targetPath>
</mount>
<mount path="/_system/config" overwrite="true">
	<instanceId>gov</instanceId>
	<targetPath>/_system/config</targetPath>
</mount>

Do not replace the following configuration when adding in the mounting configurations. The registry mounting configurations mentioned in the above steps should be added in addition to the following.

<dbConfig name="wso2registry">
    <dataSource>jdbc/WSO2CarbonDB</dataSource>
</dbConfig>

Update the <DAS_HOME>/repository/conf/axis2/axis2.xml file as follows for he nodes to enable Hazlecast clustering.
1. To enable Hazlecast clustering, set the clustering class="org.wso2.carbon.core.clustering.hazelcast.HazelcastClusteringAgent" property to true as shown below.
```
<clustering class="org.wso2.carbon.core.clustering.hazelcast.HazelcastClusteringAgent" enable="true">
```
2. To ensure that all the nodes in the cluster identify each other, enable the wka mode for all of them as shown below.
```
<parameter name="membershipScheme">wka</parameter>
```
3. Add all the nodes in the cluster as well known members under the <members> element as shown below.
  
  In a fully distributed DAS setup, Apache Spark identifies the worker nodes that analyze data via Hazalcast clustering. Only the nodes that are assigned as analyzer nodes should be identified as Spark workers, they should be in a separate cluster.
  1. For each of the two analyzer nodes, list the members as shown below. This groups them in a separate cluster.
    <members> <member> <hostName>[Analyzer1 IP]</hostName> <port>[Analyzer1 port]</port> </member> <member> <hostName>[Analyzer2 IP]</hostName> <port>[Analyzer2 port]</port> </member> </members>
  2. For the other nodes that are not analyzer nodes, list the members as shown below.
    <members> <member> <hostName>[indexer1 IP]</hostName> <port>[indexer1 port]</port> </member> <member> <hostName>[indexer2 IP]</hostName> <port>[indexer2 port]</port> </member> <member> <hostName>[Receiver1 IP]</hostName> <port>[Receiver1 port]</port> </member> <member> <hostName>[Receiver2 IP]</hostName> <port>[Receiver2 port]</port> </member> </members>
  3. In order to allow the analyzer nodes to access index data, specify the indexer nodes they must connect to by following the steps below.
    1. In the <DAS_HOME>repository/conf/analytics/analytics-data-config.xml, set the mode to REMOTE as shown below.
      <Mode>REMOTE</Mode>
      This parameter specifies whether the data services that need to be invoked for the DAS node are locally available or hosted in a remote server. Possible values are as follows:
      LOCAL : This means that the required data services are hosted within the DAS node. When the mode is LOCAL the node does not need to connect to another server to invoke these services. Therefore, you are not required to specify values for the other parameters in the analytics-data-config.xml file.
      REMOTE : This means that the required data services are hosted in a remote server. In such situations, you are required to configure the other parameters in the analytics-data-config.xml file in order to provide the details required for the DAS node to establish a connection with that remote server.
      AUTO : If AUTO is specified, the mode is automatically switched between LOCAL and REMOTE depending on the availability of the data services required by the DAS node.
      In this scenario, you are grouping analyzer nodes and indexer nodes in separate clusters. Therefore, each analyzer node needs to connect to a remote server for indexing services.
    2. Specify the server URL of an indexer node as the URL as shown below.
      <URL>http://<INDEXER_NODE_HOST>:<INDEXER_NODE_PORT></URL>
    3. Specify the user name and the password to access the indexer node as shown in the example below.
      <Username>admin</Username> <Password>admin</Password>
    It is also recommended to configure a load balancer in the cluster of indexers and point to the load balancer from each analyzer node via the URL parameter. This minimizes the impact of a single indexer node failure on the indexing operations of your DAS setup. For more information, see Load Balancing.
4. For each node, enter the respective server IP address as the value for the localMemberHost property as shown below.
```
<parameter name="localMemberHost">[Server_IP_Address]</parameter>
```
Update the <DAS_HOME>/repository/conf/event-processor.xml file of the nodes as follows.
1. Make sure that the HA mode is enabled as follows.
```
<mode name="HA" enable="true">
```
2. Enable the Distributed mode as shown below.
```
<mode name="Distributed" enable="true">
```
3. Set the following property for the two nodes that should function as presenter nodes.
  
  This property should be set only for the presenter nodes.
```
<presenter enable="true"/>
```
4. For each receiver node, enter the respective server IP address under the HA mode Config section as shown in the example below.
  When you enable the HA mode for WSO2 DAS, the following are enabled by default:
  - State persistence: If there is no real time use case that requires any state information after starting the cluster, you should disable event persistence by setting the persistence attribute to false in the <DAS_HOME>/repository/conf/event-processor.xml file as shown below.
    
    <persistence enable="false"> <persistenceIntervalInMinutes>15</persistenceIntervalInMinutes> <persisterSchedulerPoolSize>10</persisterSchedulerPoolSize> <persister class="org.wso2.carbon.event.processor.core.internal.persistence.FileSystemPersistenceStore"> <property key="persistenceLocation">cep_persistence</property> </persister> </persistence>
    
    When state persistence is enabled for WSO2 DAS, the internal state of DAS is persisted in files. These files are not automatically deleted. Therefore, if you want to save space in your DAS pack, you need to delete them manually.
    These files are created in the <DAS_HOME>/cep_persistence/<tenant-id> directory. This directory has a separate sub-directory for each execution plan. Each execution plan can have multiple files. The format of each file name is <TIMESTAMP>_<EXECUTION_PLAN_NAME> (e.g, 1493101044948_MyExecutionPlan). If you want to clear files for a specific execution plan, you need to leave the two files with the latest timestamps and delete the rest.
  - Event synchronization: However, if you set the event.duplicated.in .cluster=true property for an event receiver configured in a node, DAS does not perform event synchronization for that receiver.
```

<mode name="HA" enable="true">
   ...
    <eventSync>
        <hostName>[Server_IP_Address]</hostName>
```
  The following node types are configured for the HA deployment mode in the <DAS_HOME>/repository/conf/event-processor.xml file.
  - eventSync : Both the receiver nodes in this setup are event synchronizing nodes. Therefore, each node should have the host and the port on which it is operating specified under the <eventSync> element.
    
    Note that the eventSync port is not automatically updated to the port in which each node operates via port offset.
  - management : Both the receiver nodes in this setup carry out the same tasks, and therefore, all nodes are considered manager nodes. Therefore, each node should have the host and the port on which it is operating specified under the <management> element.
    
    Note that the management port is not automatically updated to the port in which each node operates via port offset.
  - presentation : You can optionally specify one or more nodes in this setup as the presenter nodes. The dashboards in which processed information is displayed are configured only in the presenter nodes. Each node should have the host and the port on which the assigned presenter node is operating specified under the <presentation> element. The host and the port as well as the other configurations under the <presentation> element are effective only when the presenter enable="true property is set under the  section.
To define the two analyzer nodes as Spark masters, configure the <DAS_home>/repository/conf/analytics/spark/spark-defaults.conf file as follows.

2 Sparkmasters are created because this cluster is a high-availability cluster. When the active spark master fails, the other node configured as a Spark master becomes active and continues to carry out the tasks os the Spark master.
1. Specify the number of Spark masters as 2 by setting the following property.
```
carbon.spark.master.count  2
```
2. Specify a DAS symbolic link for both nodes as shown in the example below.
  - The directory path for the Spark Class path is different for each node depending on the location of the <DAS_HOME>. The symbolic link redirects the Spark Driver Application to the relevant directory for each node when it creates the Spark class path.
  - In a multi node DAS cluster that runs in a RedHat Linux environment, you also need to update the <DAS_HOME>/bin/wso2server.sh file with the following entry so that the <DAS_HOME> is exported. This is because the symbolic link may not be resolved correctly in this operating system.
    
    Export CARBON_HOME=<symbolic link>
  For more information about Spark related configurations, see Spark Configurations.
In order to share the C-Apps deployed among the nodes, configure the SVN-based deployment synchronizer. For detailed instructions, see Configuring SVN-Based Deployment Synchronizer.

If you do not configure the deployment synchronizer, it is required to deploy any C-App you use in the fully distributed HA set up to all the nodes.
If the physical DAS server has multiple network interfaces with different IPs, and if you want Spark to use a specific Interface IP, open either the <DAS_HOME>/bin/load-spark-env-vars.sh file (for Linux) or <DAS_HOME>/bin/load-spark-env-vars.bat file (for Windows), and add the following parameter to configure the Spark IP address.
```
export  SPARK_LOCAL_IP=<IP_Address>
```
Create a file named "my-node-id.dat" in <DAS_HOME>/repository/conf/analytics folder and add a id of your preference. This ID should be unique across all other DAS nodes in the cluster. Same ID should not be in more than one node. If this ID is not provided, a node ID will be generated by the server. This node id is also stored in the Database(Primary record store which is WSO2_ANALYTICS_EVENT_STORE_DB) for future reference.
```
<Any id which is unique across all other my-node-id.dat files in the cluster>
```
If, for any reason, you want to replace the DAS servers with new packs and want to point the new packs to the same databases, you need to back up the my-node-id.dat files from both previous installations and restore them in the new DAS packs. If this is not done,
two new node ids will be created while the older two node ids are still in the databases. So altogether, there will be 4 node ids in the database. This might lead to various indexing inconsistencies. If you are going to clean the whole DB, you can start without restoring the older node ids.

Configuring Hazelcast properties

You can configure the hazelcast properties for the product nodes by following the steps given below.

Create the hazelcast.properties file with the following property configurations, and copy the file to the <GATEWAY_HOME>/repository/conf/ directory.
```
#Disabling the hazelcast shutdown hook
hazelcast.shutdownhook.enabled=false
#Setting the hazelcast logging type to log4j
hazelcast.logging.type=log4j
```
The above configurations are explained below.
- Hazelcast shutdown hook: This configuration disables the shutdown hook in hazelcast, which ensures that the hazelcast instance shuts down gracefully whenever the product node shuts down. If the hazelcast shutdown hook is enabled (which is the default behavior of a product), you will see errors such as "Hazelcast instance is not active!" at the time of shutting down the product node: This is because the hazelcast instance shuts down too early when the shutdown hook is enabled.
- Hazelcast logging type: This configuration sets the hazelcast logging type to log4j, which allows hazelcast logs to be written to the wso2carbon.log file.
If you have enabled log4j for hazelcast logging as shown above, be sure to enter the configuration shown below in the log4j.properties file (stored in the <GATEWAY_HOME>/repository/conf/ directory). This can be used to configure the log level for hazelcast logging. For a clustered production environment, it is recommended to use INFO as the log level as shown below.
```
log4j.logger.com.hazelcast=INFO
```

Starting the cluster

When starting the instances you can provide predefined profiles to start the instances as receiver nodes, analyzer nodes or Indexer nodes.

Node Type	Disabled Components	Option
Receiver Node	AnalyticsEngine, AnalyticsExecution, Indexing, DataPurging, AnalyticsSparkCtx, AnalyticsStats	-receiverNode
Indexer Node	AnalyticsExecution, AnalyticsEngine, EventSink, AnalyticsSparkCtx, AnalyticsStats, DataPurging	-indexerNode
Analyzer Node	Indexing, EventSink, DataPurging, IndexThrottling, AnalyticsStats	-analyzerNode

These can be provided at the server startup. For example:

sh wso2server.sh -indexerNode would start the instance as a Indexer Node.

Note: You cannot use more than one aforementioned predefined profiles when starting the server.

If you encountered any similar warn logs as given below,

When your user ID is different to root which is the default user ID, and if you do not have write access, the following warning message can appear:

WARN {java.util.prefs.FileSystemPreferences} -  Could not lock System prefs. Unix error code 7. {java.util.prefs.FileSystemPreferences}

To address this issue, follow the steps below:

Create a directory in a location accessible to the user running the JVM, and have the following substructure: <CREATED_DIR>/.java/.systemPrefs.
Start the DAS server with the following Java option. Alternatively, you can add this Java option to the <DAS_HOME>/bin/wso2server.sh startup script.
-Djava.util.prefs.systemRoot=<CREATED_DIR>/.java

Related links

See the following for more information.

For more information on disabling certain components, see Disabling Selected DAS Components.
For more information on the architecture of DAS and the usage of its components, see DAS Architecture.