This site contains the documentation that is relevant to older WSO2 product versions and offerings.
For the latest WSO2 documentation, visit https://wso2.com/documentation/.

Configuring Cassandra Cluster

Data that comes to BAM through data receivers is usually stored in the default Cassandra database. The image above shows how the Cassandra databases of all two  BAM nodes are deployed in a cluster. This ensures that even if one node fails, data can be received and stored in other databases in the cluster, and also ensures high availability of data to run the Hive scripts on.   

Information to know before you start

  • Increase the heap memory size of BAM nodes to at least 2 GB and sync times in all nodes. 
  • BAM 2.4.0 uses Cassandra version 1.1.3 while BAM 2.4.1 uses Cassandra version 1.2.13.
  • The fully-distributed BAM setup uses node 3, 4 and 5, which is why this topic includes configurations for node 3, 4 and 5, so you must change the configurations accordingly if you are using different setup.
  • You can start the BAM server using the Cassandra profile, thus BAM can act as Cassandra in your cluster. See Running the Product on a Preferred Profile for more information on how to do this.
  • For instructions on using external Cassandra with WSO2 BAM, see Connecting to External Cassandra.
  1. Add the following configurations to < BAM_HOME>/repository/conf/etc/cassandra.yaml file in the nodes mentioned below.

    To node3:

    cluster_name:   Test Cluster
    initial_token:  0 
    seed_provider:
           - seeds: "node3,node4,node5"
    listen_address: node3
    rpc_address: node3
    rpc_port: 9160

    For Cassandra 1.2.13 (in BAM 2.4.1) the initial_token value cannot be 0. You must enter the value generated by the script.

    to node4:

    cluster_name: Test Cluster
    initial_token: 56713727820156410577229101238628035242
    seed_provider:
           - seeds: "node3,node4,node5"
    listen_address: node4
    rpc_address: node4
    rpc_port: 9160

    to node5:

    cluster_name: Test Cluster
    initial_token: 113427455640312821154458202477256070485
    seed_provider:
           - seeds: "node3,node4,node5"
    listen_address: node5
    rpc_address:    node5
    rpc_port:	    9160
  2. Connect the nodes to Cassandra endpoints.

  3. Edit the < BAM_HOME>/repository/conf/advanced/streamdefn.xml file in all nodes as follows. This changes replication factor and read/write consistency levels using which data receivers write data to Cassandra. For example, if you have four Cassandra nodes in the cluster, enter 3 as the value for the <ReplicationFactor> property.

    <StreamDefinition>
    	<ReplicationFactor>3</ReplicationFactor>
    	<ReadConsistencyLevel>QUORUM</ReadConsistencyLevel>
    	<WriteConsistencyLevel>ONE</WriteConsistencyLevel>
    	<StrategyClass>org.apache.cassandra.locator.SimpleStrategy</StrategyClass>
    </StreamDefinition>
  4. Configure the datasources. A set of JDBC URLs must be added as a comma separated list when load balancing is required.

  5. Optionally in order to view  the cluster information in the Cassandra Keyspaces List UI, add a file named cassandra-endpoint.xml in  <BAM_HOME>/repository/conf/etc with following configuration. The cassandra-endpoint.xml file is required when deploying the backend Cassandra cluster in a IaaS like AWS. IaaS may not provide real IPs, hence it is necessary to use this configuration file to list the mapped real IPs. 

    <Cassandra>
     <EndPoints>
        <EndPoint><HostName>name_of_machine1(BAM N1)</HostName></EndPoint>
        <EndPoint><HostName>name_of_machine2(BAM N2)</HostName></EndPoint>
     </EndPoints>
    </Cassandra>

    When configuring an external Cassandra cluster, you must additionally enable clustering in the <BAM_HOME>/repository/conf/axis2/axis2.xml file.

    <clustering class="org.wso2.carbon.core.clustering.hazelcast.HazelcastClusteringAgent" enable="true">
  6. After starting the Cassandra cluster, you can verify the status of the cluster using a NodeTool command. For example, the below command is used to access the Cassandra keyspaces via NodeTool. (Port 9999 is the JMX port.)

    ./nodetool -u admin -pw admin -h localhost -p 9999 cfstats

  • You can connect to the Cassandra cluster using the Cassandra CLI tool. For example, the following commands are used to access the EVENT_KS Cassandra keyspace using Cassandra CLI.

    ./cassandra-cli -h localhost -pw admin -u admin
    show keyspaces
    use EVENT_KS;
    show schema EVENT_KS;

    When configuring the Cassandra cluster in this setup, you need to do the following for the Cassandra keyspaces feature to function and list the Cassandra keyspaces in the Main menu of the WSO2 BAM maangement console.

    • If you are using internal Cassandra, which is shipped with WSO2 BAM, both BAM nodes and Cassandra nodes should be in the same clustering domain.

    • If you are using external Cassandra, to change the following configuration in the <BAM_HOME>/repository/conf/etc/cassandra.yaml file to use the AllowAllAuthenticator.

      authenticator:AllowAllAuthenticator