Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This functionality can be used to archive your Cassandra data, and archiving process use map-reduce jobs to achive the task, therefore it supports for archiving large amount of data  using cluster of hadoop nodes. You can run your archive process manually or you can schedule it using a cron expression.  Follow the instructions below to use it. 

 

  • Log in to BAM's management console and select "Home -> Configure ->  Archive Data"
  • Manually archive data between specified date range. 
    Image Modified

Configuration parameters 

    • Stream name -  In BAM data model stream name map to a Cassandra Column Family. In here you need to provide the stream name, then data stored under that stream name can be archived. 
    • Version         -  Version of the stream. If you have different versions under same stream name (We don't encourage it), then this can be used to archive data specific to some version. 
    • Date range 
    • Username     -  Cassandra username (same as BAM username)
    • Password      - Cassandra password (same as BAM password)
    • Date range 
      • From  - Start date - ex:- 25/01/2013 00:00:00 AM
      • To      - End date - ex:- 03/02/2013 00:00:00 AM
    • Password     - Cassandra password (same as BAM password)
    • External Cassandra cluster 
      • Connection URL - connection details of Cassandra cluster. ex:- 10.100.60.150:9160,10.100.60.151:9160 

  • Schedule archive process 

 

        Additional Configuration parameters 

    • No of days  - Keeps only last 'n' no of days data in the Column Family ex :- According to above configuration it will only keeps last 90 days of data. Data older than 90 days will archive.  

    • Cron expression - Archive process can be schedule using the given cron expression. ex:- According to above configuration archive job run on every day in the midnight.


Archive Column Family details 

Name of the archive column family will be original column family name + _arch 

...

 

 

Change schedule time later

...

 

Once you submit the configuration to run the archive process it will create a hive script and execute it. You can see the hive script in "Analytics -> List" (For manual process it only execute the hive query and don't save it) 

Image Modified

              Now you can change the schedule time using "schedule script" option.