Archiving Cassandra Data

WSO2 BAM uses MapReduce jobs to archive Cassandra data. As a result, you can archive a large amount of data using a cluster of Hadoop nodes. You run the archive process manually or schedule it using a cron expression as explained below.

Log in to BAM management console and select Archive Data menu under the Configure menu.

Archiving data manually

The Cassandra data archive configuration opens. Select the Date Range option to manually archive data between a specific date range.
manual data archival
The configuration parameters are explained below.

Parameter	Description
Stream Name	In the BAM data model, stream name maps to a Cassandra column family. You provide the stream name to archive the data stored under that stream name.
Version	Version of the stream. Used to specify which version to archive when there are multiple versions under the same stream name (as recommended).
Date range	Specifies the start and end dates. E.g., From - 25/01/2013 00:00:00 AM to 03/02/2013 00:00:00 AM
Username/Password	Cassandra username and password (same as BAM credentials)
External Cassandra cluster	Connection URL - connection details of Cassandra cluster. E.g.,10.100.60.150:9160,10.100.60.151:9160

Scheduling the archive

Select the Below this number of days option to schedule an archival process. For example:
scheduling the archive The configuration parameters are explained below:

Parameter	Description
Stream Name	In the BAM data model, stream name maps to a Cassandra column family. You provide the stream name to archive the data stored under that stream name.
Version	Version of the stream. Used to specify which version to archive when there are multiple versions under the same stream name (as recommended).
No of days	Keeps only last 'n' no of days data in the Column Family. For example, according to above configuration, the system only runs data from the last 90 days and archives the older data.
Cron expression	Cron expression is used to schedule the archive process. For example, according to above configuration, the archive job runs everyday at midnight.
External Cassandra cluster	Connection URL - connection details of Cassandra cluster. E.g.,10.100.60.150:9160,10.100.60.151:9160
Username/Password	Cassandra username and password (same as BAM credentials)

Name of the archive column family is <original column family name> + _arch.
Cassandra streams are generated with underscores (_). Replace the underscores in the stream name with dot (.) when archiving. For example, if stream name is org_wso2_bam_phone_retail_store_kpi, mention it as org.wso2.bam.phone.retail_store.kpi when archiving.

Click Submit once you are done.
Once you submit a scheduled archive, the system creates a Hive script and executes it.
Click Main, and then click List under the Analytics menu.

Note that step 6 does not apply to the manual archiving process, which only executes the Hive query, but doesn't save it.
Select your script, and click the Schedule Script link associated with it to change the schedule time of your script.

Business Activity Monitor 2.4.1

Archiving Cassandra Data

Analytics

Archiving data manually

Scheduling the archive

Related content