This site contains the documentation that is relevant to older WSO2 product versions and offerings.
For the latest WSO2 documentation, visit https://wso2.com/documentation/.

Cassandra Migration Guide

This guide explains the Cassandra migration steps from Cassandra version 1.1.3 to version 1.2.13.

Setting up Parallel SSH tool

You need to install Parallel SSH tool in a central point to execute the NodeTool commands in all product nodes. This is done as to take the backup in all product nodes at the same time, to avoid any data inconsistencies. Follow the steps below to set up the Parallel SSH tool.

  1. Download pssh-2.3.1.tar.gz file and instalI Parallel SSH on the client machine. 
  2. Create a text file (for example: host.txt) in <PSSH_HOME>/bin/ directory, and add all the host names of the product nodes in it.

    <PSSH_HOME> refers to the installation directory of Parallel SSH.

  3. Run the following command to check if Parallel SSH is functioning as expected:

    ./pssh-2.3.1/bin/pssh -h ./pssh-2.3.1/bin/host.txt -A -v -l <ssh_username> -o <output_dir> uptime

    <ssh_username> refers to the username of the user account, and <output_dir> refers to the folder, in which all the output text files will be saved.

  4. Enter the password of your user account on the machine.

    For example, in a scenario with a setup of three product nodes, the following output will be displayed if Parallel SSH is functioning as expected:

    [1] 06:08:37 [SUCCESS] 192.168.5.20
    [2] 06:08:37 [SUCCESS] 192.168.5.19
    [3] 06:08:37 [SUCCESS] 192.168.5.18

Backing up data from Cassandra  source 

  1. Download Apache Cassandra 1.1.3, and copy the installation directory to the same location of all the product nodes.
  2. Navigate to one of the <CASSANDRA_HOME>/bin/ directories, and enter the following command to log in to Cassandra via NodeTool:

    <CASSANDRA_HOME> refers to the extracted installation directory of Apache Cassandra 1.1.3.

    /nodetool -h <host> -p 9999 -u <username> -pw <password> ring
    The following output will be displayed if NodeTool is operating successfully:

    10.57.244.224 datacenter1 rack1 Up Normal 498.19MB 100.00% 0 
    10.57.244.226 datacenter1 rack1 Up Normal 502.76MB 100.00% 56713727820156410577229101238628035242 
    10.57.244.229 datacenter1 rack1 Up Normal 500.91MB 100.00% 113427455640312821154458202477256070485
  3.  Navigate to the <PSSH_HOME> folder and enter the following command, to check if the above command works with the Parallel SSH tool.

    ./pssh-2.3.1/bin/pssh -h ./pssh-2.3.1/bin/host.txt -A -v -l <ssh_username> -o <output_directory> bash <CASSANDRA_HOME>/bin/nodetool -h <host> -p 9999 -u <username> -pw <password> ring

    The following output files will be created in the specified output directory if Parallel SSH tool is operating successfully:

    Address DC Rack Status State Load Effective-Ownership Token 
    113427455640312821154458202477256070485 
    10.57.244.224 datacenter1 rack1 Up Normal 498.19MB 100.00% 0 
    10.57.244.226 datacenter1 rack1 Up Normal 502.76MB 100.00% 56713727820156410577229101238628035242 
    10.57.244.229 datacenter1 rack1 Up Normal 500.91MB 100.00% 113427455640312821154458202477256070485 
  4. Run the following commands in the given order on each and every product node performing a rolling restart:

    ./nodetool -h localhost -p 9999 -u user_name -pw password disablethrift 
    ./nodetool -h localhost -p 9999 -u user_name -pw password disablegossip 
    ./nodetool -h localhost -p 9999 -u user_name -pw password drain 
    ./nodetool -h localhost -p 9999 -u user_name -pw password flush 
  5. Restart the product node and run the following commands:

    ./nodetool -h localhost -p 9999 -u user_name -pw password enablethrift
    ./nodetool -h localhost -p 9999 -u user_name -pw password enablegossip
  6. Clean up any existing snapshots in the source Cassandra cluster, by running NodeTool clearsnapshot command via Parent SSH. This is done to avoid any inconsistencies that could potentially occur, if we mix up already existing snapshots with the new snapshots.


  7. Run the below command from the <PSSH_HOME> folder to create new snapshots:

    <PSSH_HOME>/bin/pssh -h <PSSH_HOME>/bin/host.txt -A -v -l <username> -o /tmp/foo "sh <CASSANDRA_HOME>/bin/nodetool -h localhost -p 9999 -u <username> -pw <password> snapshot"
  8. Navigate to a cassandra.yaml file of one node and set incremental_backups to true and then restart the node. Perform this to all nodes in rolling manner. 

     

    This is done to backup data that has been changed since the last snapshot. Each time a SSTable is flushed, a hard link is copied into a /backups sub-directory of the data directory (provided JNA is enabled).

Migrating schema

Before migrating Cassandra data to a new cluster, you need to migrate the schema, using the Cassandra CLI  tool. 

  1.  Navigate  to <CASSANDRA_HOME>/bin/ directory of one product node and run the following command:
    /cassandra-cli -u <username> -pw <password> -h <host>
  2. Run the following command: show schema;

  3. Copy all the schema creation commands of all non-system key spaces and save them for later use.

  4. Navigate to each new product cluster node, and change partitioner property in the cassandra.yaml file as follows:

    partitioner : org.apache.cassandra.dht.RandomPartitioner

    In older Cassandra versions, default partitioner is RandomPartitioner. Therefore, we change this property, because when migrating data created with that partitioner, you need to use the same partitioner in the new cluster as well.

  5. Start the product nodes.

  6. Download Apache Cassandra 1.2.13 and extract the installation directory to one new cluster node.

  7. Navigate to the <CASSANDRA_HOME>/bin/ directory and run the following command to log in:

      ./cassandra-cli -u <username> -pw <password> -h <host>
  8. Run the commands you saved in step 3.

  9. Repeat the above two steps for all new Cassandra nodes by appropriately changing the <host> element of the command in step 7.

Migrating backed-up data to new product nodes

  1. Move all snapshot data and incremental backup data into the new product nodes. 


    You have to copy all the content in the <latest_snapshot_directory>/snapshots/ directory and the <latest_snapshot_directory>/backups/ directory of each old Cassandra node to the corresponding column families of new cluster nodes. For example:

    cp -r wso2bam-2.2.0/repository/database/cassandra/data/EVENT_KS/org_wso2_bam_phone_retail_store_kpi/snapshots/1409052028070/* wso2bam-2.4.1/repository/database/cassandra/data/EVENT_KS/org_wso2_bam_phone_retail_store_kpi/ 


  2. Upgrade the existing Cassandra SSTables. Run the following command to rebuild SSTables that are not on the current version:

    ./pssh-2.3.1/bin/pssh -t -1 -h ./pssh-2.3.1/host.txt -v -l <username> -o <output_directory> bash 
    apache-cassandra-1.2.13/bin/nodetool -h localhost -p 9999 -u admin -pw admin 
    upgradesstables -a
  3.   Run the following command to reload the SSTables for all column families. 

    This loads newly placed SSTables on to the system without restarting the nodes.
    For example:
    ./pssh-2.3.1/bin/pssh -t -1 -h ./pssh-2.3.1/host.txt -v -l <username> -o <output_directory> bash cha/apache-cassandra-1.2.13/bin/nodetool -h localhost -p 9999 -u admin -pw admin refresh EVENT_KS org_wso2_bam_activity_monitoring