This site contains the documentation that is relevant to older WSO2 product versions and offerings.
For the latest WSO2 documentation, visit https://wso2.com/documentation/.
Configuring Data Analyzer Cluster
Data analyzer component of each BAM node uses Hive query language scripts to retrieve data from the Cassandra cluster, process the data into meaningful information, and save the information in an RDBMS. In this example, we use MySQL as the RDBMS. You get an H2 database with WSO2 BAM by default but it is not recommended in a high volume, production setting. The analyzer components in node1 and node2 are clustered in this setup and it extends the data processing part to yet another external Apache Hadoop cluster.
The data analyzer cluster uses the Registry to store metadata related to Hive scripts and scheduled tasks. It uses Hazelcast to handle coordination required by the nodes in the Analyzer clusters when running Hive scripts. These settings ensure high availability using a failover mechanism so that if one node fails, the rest can take up its load and complete the task. The diagram below depicts this setup:
The BAM nodes in the analyzer cluster are used for three main purposes:
- Submit analytics queries to Hadoop cluster periodically as scheduled
- Receive data from data agents and persist them to Cassandra cluster
- Host end-user dashboards
The following steps provide instructions on how to configure the analyzer cluster. Here you must do the configurations in the analyzer nodes. The instructions in this section assume that node 1 and node 2 are the data analyzer nodes.
Do the following steps for both node 1 and node 2.
- Download and extract WSO2 BAM to both analyzer nodes.
- Place MySQL connector .jar file inside
<BAM_HOME>/repository/components/lib
folder. You must download this. Add the following datasource configuration in
<BAM_HOME>/repository/conf/datasources/
master-datasources.xml
file. Be sure to change the database URL and credentials according to your environment. TheWSO2_REG_DB
database is used in this example by the shared registry.<datasource> <name>WSO2_REG_DB</name> <description>The datasource used for config</description> <jndiConfig> <name>jdbc/WSO2RegDB</name> </jndiConfig> <definition type="RDBMS"> <configuration> <url>jdbc:mysql://[host]:[port]/[reg-db]</url> <username>reg_user</username> <password>password</password> <driverClassName>com.mysql.jdbc.Driver</driverClassName> <maxActive>50</maxActive> <maxWait>60000</maxWait> <testOnBorrow>true</testOnBorrow> <validationQuery>SELECT 1</validationQuery> <validationInterval>30000</validationInterval> </configuration> </definition> </datasource>
Add the following to
<BAM_HOME>/repository/conf/registry.xml
file. These are mounting configurations to share the registry for both nodes.<dbConfig name="wso2GReg"> <dataSource>jdbc/WSO2RegDB</dataSource> </dbConfig> <remoteInstance url="https://localhost:9443/registry"> <id>registryInstance</id> <dbConfig>wso2GReg</dbConfig> <readOnly>false</readOnly> <registryRoot>/</registryRoot> <cacheId>root@jdbc:mysql://localhost:3306/governancedb</cacheId> </remoteInstance> <mount path="/_system/config" overwrite="true"> <instanceId>registryInstance</instanceId> <targetPath>/_system/config</targetPath> </mount> <mount path="/_system/governance" overwrite="true"> <instanceId>registryInstance</instanceId> <targetPath>/_system/governance</targetPath> </mount>
Now the registry has been mounted and shared in both nodes.
To create the registry schema, execute
<BAM_HOME>/dbscripts/
script asmysql.sql
reg_user
in MySQLreg-db
database. This needs to be done only in one node as the registry is now shared.Alternatively you could just use the following startup script to create the required tables (if they are not created already). Note that this also needs to be done only in one node.
sh wso2server.sh -Dsetup
Use
bat wso2server.bat -Dsetup
(for Windows). When starting up the server, you can also check if the registry has been mounted properly.- Edit the
<
BAM_HOME>/repository/conf/axis2/axis2
.xml
file and enable clustering as follows. This is to be done in both nodes.<clustering class="org.wso2.carbon.core.clustering.hazelcast.HazelcastClusteringAgent" enable="true">
- In above clustering configuration, make sure to also configure the following properties correctly.
membershipScheme
- This indicates the cluster membership scheme being used. Set it to "multicast".localMemberHost
- The host name or IP address of the member. Set it to relevant host name of the machine (e.g., node1).
Add the following to the
tasks-config.xml
file, which is in the<BAM_HOME>/repository/conf/etc/
directory in the Analyzer nodes.<taskServerMode>CLUSTERED</taskServerMode> <taskServerCount>2</taskServerCount>
About the task server count
This value indicates the number of task servers running in the cluster along with the analyzer nodes.
The task server count handles the analyzer node startup, where the analyzer nodes will hold the startup of the server until the number of servers specified in the
taskServerCount
property are started up. It is only when that count is reached that the startup of all of those servers will continue to the end.The reason that the analyzer nodes are held in the startup is so that other analyzers can also join in when scheduling the tasks (Hive script jobs). That way the scripts will be shared between all the analyzers that are available, and all scripts will not just be scheduled initially in the first server that is started up.
The following configuration must be done if you wish to change the database used to store metadata for the Hive script. Modify
<BAM_HOME>/repository/conf/advanced/hive-site.xml
as follows. It has a line added tohive.aux.jars.path
property to include MySQL connector JAR in Hadoop job execution runtime. Windows users must use the<BAM_HOME>/repository/conf/advanced/hive-site-win.xml
file instead.Additional details and recommendations
By default, this is stored in an H2 database, and these steps will enable this to be stored in MySQL as appropriate for this scenario. While this step is not a must, it is recommended for production environments to use a separate database instance such as MySQL or Oracle as a Hive metastore. See Configuring a Metadata Store for Hive for more information.
<property> <name>hadoop.embedded.local.mode</name> <value>false</value> </property> <property> <name>hive.metastore.warehouse.dir</name> <value>/user/hive/warehouse</value> <description>location of default database for the warehouse</description> </property> <property> <name>fs.default.name</name> <value>hdfs://node1:9000</value> </property> <property> <name>mapred.job.tracker</name> <value>node1:9001</value> </property> <property> <name>hive.aux.jars.path</name> <value>file://${CARBON_HOME}/repository/components/plugins/apache- cassandra_1.2.13.wso2v2.jar,file://${CARBON_HOME}/repository/components/plugins/guava_12.0.0.wso2v1.jar,file://${CARBON_HOME}/repository/components/plugins/json_2.0.0.wso2v1.jar,file://${CARBON_HOME}/repository/components/plugins/commons-dbcp_1.4.0.wso2v1.jar,file://${CARBON_HOME}/repository/components/plugins/commons-pool_1.5.6.wso2v1.jar,file://${CARBON_HOME}/repository/components/lib/mysql-connector-java-5.1.5-bin.jar </value> </property>
Add the following configuration for
WSO2BAM_DATASOURCE
in<
BAM_HOME>/repository/conf/datasources/bam-datasources.xml
file of both analyzer nodes. Be sure to change the database URL and credentials according to your environment.WSO2BAM_DATASOURCE
is the default data source available in BAM and it should be connected with the database you are using. This example uses thebam-db
database to store BAM summary data.Note that this configuration must be changed in the
<
BAM_HOME>/repository/conf/datasources/
master-datasources.xml
file if you are using BAM 2.4.0 instead of BAM 2.4.1.<datasource> <name>WSO2BAM_DATASOURCE</name> <description>The datasource used for analyzer data</description> <definition type="RDBMS"> <configuration> <url>jdbc:mysql://localhost:3306/reg-db</url> <username>root</username> <password>admin</password> <driverClassName>com.mysql.jdbc.Driver</driverClassName> <maxActive>50</maxActive> <maxWait>60000</maxWait> <testOnBorrow>true</testOnBorrow> <validationQuery>SELECT 1</validationQuery> <validationInterval>30000</validationInterval> </configuration> </definition> </datasource>
If you are using BAM 2.4.1, start the BAM server in both analyzer nodes, and use the Deployment Synchronizer to specify one node as a read/write node and one as a read-only node.
Tip: Note that there is no concept of worker/manager separation for the BAM cluster and the topic on SVN-based deployment synchronizer mentions worker and manager configurations. Consider the manager and worker nodes mentioned there as node 1 and node 2.
Additional instructions and points to note
When starting BAM instances, use
disable.cassandra.server.startup
property to stop running Cassandra bundled with BAM by default. We need to point to the external Cassandra cluster.sh wso2server.sh -Ddisable.cassandra.server.startup=true
For BAM 2.4.0 or setups without SVN, remove BAM Toolbox Deployer feature using feature manager. We remove the feature because having deployers in both Analyzer BAM nodes interferes with proper Hive task fail-over functionality. We leave the BAM Toolbox Deployer feature in node1 so that it can copy the relevant files to the target location and schedule the Hive script.
BAM 2.4.1 gives you the option of disabling certain BAM components in addition to this. See here for more information on this.
You may also use the following to disable notifications.
-Ddisable.notification.task