Cassandra Migration Guide
This guide explains the Cassandra migration steps from Cassandra version 1.1.3 to version 1.2.13.
Setting up Parallel SSH tool
You need to install Parallel SSH tool in a central point to execute the NodeTool commands in all product nodes. This is done as to take the backup in all product nodes at the same time, to avoid any data inconsistencies. Follow the steps below to set up the Parallel SSH tool.
- Download pssh-2.3.1.tar.gz file and instalI Parallel SSH on the client machine.
Create a text file (for example:
host.txt
) in<PSSH_HOME>/bin/
directory, and add all the host names of the product nodes in it.<PSSH_HOME>
refers to the installation directory of Parallel SSH.Run the following command to check if Parallel SSH is functioning as expected:
.
/pssh-2.3.1/bin/pssh -h ./pssh-2.3.1/bin/host.txt -A -v -l <ssh_username> -o <output_dir> uptime
<ssh_username>
refers to the username of the user account, and<output_dir>
refers to the folder, in which all the output text files will be saved.Enter the password of your user account on the machine.
For example, in a scenario with a setup of three product nodes, the following output will be displayed if Parallel SSH is functioning as expected:
[1] 06:08:37 [SUCCESS] 192.168.5.20 [2] 06:08:37 [SUCCESS] 192.168.5.19 [3] 06:08:37 [SUCCESS] 192.168.5.18
Backing up data from Cassandra source
- Download Apache Cassandra 1.1.3, and copy the installation directory to the same location of all the product nodes.
Navigate to one of the
<CASSANDRA_HOME>/bin/
directories, and enter the following command to log in to Cassandra via NodeTool:<CASSANDRA_HOME>
refers to the extracted installation directory of Apache Cassandra 1.1.3./nodetool -h <host> -p 9999 -u <username> -pw <password> ring
The following output will be displayed if NodeTool is operating successfully:10.57.244.224 datacenter1 rack1 Up Normal 498.19MB 100.00% 0 10.57.244.226 datacenter1 rack1 Up Normal 502.76MB 100.00% 56713727820156410577229101238628035242 10.57.244.229 datacenter1 rack1 Up Normal 500.91MB 100.00% 113427455640312821154458202477256070485
Navigate to the
<PSSH_HOME>
folder and enter the following command, to check if the above command works with the Parallel SSH tool../pssh-2.3.1/bin/pssh -h ./pssh-2.3.1/bin/host.txt -A -v -l <ssh_username> -o <output_directory> bash <CASSANDRA_HOME>/bin/nodetool -h <host> -p 9999 -u <username> -pw <password> ring
The following output files will be created in the specified output directory if Parallel SSH tool is operating successfully:
Address DC Rack Status State Load Effective-Ownership Token 113427455640312821154458202477256070485 10.57.244.224 datacenter1 rack1 Up Normal 498.19MB 100.00% 0 10.57.244.226 datacenter1 rack1 Up Normal 502.76MB 100.00% 56713727820156410577229101238628035242 10.57.244.229 datacenter1 rack1 Up Normal 500.91MB 100.00% 113427455640312821154458202477256070485
Run the following commands in the given order on each and every product node performing a rolling restart:
./nodetool -h localhost -p 9999 -u user_name -pw password disablethrift ./nodetool -h localhost -p 9999 -u user_name -pw password disablegossip ./nodetool -h localhost -p 9999 -u user_name -pw password drain ./nodetool -h localhost -p 9999 -u user_name -pw password flush
Restart the product node and run the following commands:
./nodetool -h localhost -p 9999 -u user_name -pw password enablethrift ./nodetool -h localhost -p 9999 -u user_name -pw password enablegossip
Clean up any existing snapshots in the source Cassandra cluster, by running NodeTool
clearsnapshot
command via Parent SSH. This is done to avoid any inconsistencies that could potentially occur, if we mix up already existing snapshots with the new snapshots.
Run the below command from the
<PSSH_HOME>
folder to create new snapshots:<PSSH_HOME>/bin/pssh -h <PSSH_HOME>/bin/host.txt -A -v -l <username> -o /tmp/foo "sh <CASSANDRA_HOME>/bin/nodetool -h localhost -p 9999 -u <username> -pw <password> snapshot"
Navigate to a cassandra.yaml file of one node and set
incremental_backups
to true and then restart the node. Perform this to all nodes in rolling manner.This is done to backup data that has been changed since the last snapshot. Each time a SSTable is flushed, a hard link is copied into a
/backups
sub-directory of the data directory (provided JNA is enabled).
Migrating schema
Before migrating Cassandra data to a new cluster, you need to migrate the schema, using the Cassandra CLI tool.
- Navigate to
<CASSANDRA_HOME>/bin/
directory of one product node and run the following command:
/cassandra-cli -u <username> -pw <password> -h <host>
Run the following command:
show schema;
Copy all the schema creation commands of all non-system key spaces and save them for later use.
Navigate to each new product cluster node, and change
partitioner
property in the cassandra.yaml file as follows:partitioner : org.apache.cassandra.dht.RandomPartitioner
In older Cassandra versions, default partitioner is
RandomPartitioner
. Therefore, we change this property, because when migrating data created with that partitioner, you need to use the same partitioner in the new cluster as well.Start the product nodes.
Download Apache Cassandra 1.2.13 and extract the installation directory to one new cluster node.
Navigate to the
<CASSANDRA_HOME>/bin/
directory and run the following command to log in:./cassandra-cli -u <username> -pw <password> -h <host>
Run the commands you saved in step 3.
Repeat the above two steps for all new Cassandra nodes by appropriately changing the
<host>
element of the command in step 7.
Migrating backed-up data to new product nodes
Move all snapshot data and incremental backup data into the new product nodes.
You have to copy all the content in the
<latest_snapshot_directory>/
snapshots/
directory and the<latest_snapshot_directory>/backups/
directory of each old Cassandra node to the corresponding column families of new cluster nodes. For example:cp -r wso2bam-2.2.0/repository/database/cassandra/data/EVENT_KS/org_wso2_bam_phone_retail_store_kpi/snapshots/1409052028070/* wso2bam-2.4.1/repository/database/cassandra/data/EVENT_KS/org_wso2_bam_phone_retail_store_kpi/
Upgrade the existing Cassandra SSTables. Run the following command to rebuild SSTables that are not on the current version:
./pssh-2.3.1/bin/pssh -t -1 -h ./pssh-2.3.1/host.txt -v -l <username> -o <output_directory> bash apache-cassandra-1.2.13/bin/nodetool -h localhost -p 9999 -u admin -pw admin upgradesstables -a
Run the following command to reload the SSTables for all column families.
This loads newly placed SSTables on to the system without restarting the nodes.
For example:./pssh-2.3.1/bin/pssh -t -1 -h ./pssh-2.3.1/host.txt -v -l <username> -o <output_directory> bash cha/apache-cassandra-1.2.13/bin/nodetool -h localhost -p 9999 -u admin -pw admin refresh EVENT_KS org_wso2_bam_activity_monitoring