Quick Start Guide

You can use WSO2 BAM to analyze and display published API invocation calls sent to it from external API clients. This guide walks you through the following sections to simulate this use case. In this guide, a CURL command simulates sending API invocation calls as events to BAM 2.5.0 using HTTP input adapter. Also, it describes how the published events are analyzed using Hive queries to summarize them based on time on a per second, per minute and per hour basis, or based on a one to one mapping, which synchronizes publishing. Statistics of the summarized data are displayed using a WSO2 BAM dashboard.

Starting WSO2 BAM

Follow the steps below to start WSO2 BAM server.

Download WSO2 Business Activity Monitor.
Install Oracle Java SE Development Kit (JDK) version 1.6.24 or later or 1.7.*.
Set the JAVA_HOME environment variable.
Using the command line, navigate to <BAM_HOME>/bin, and execute the following command: wso2server.bat (for Windows) or wso2server.sh (for Linux).
Wait until you see the message "WSO2 Carbon started in 'n' seconds."
It indicates that the server started successfully. To stop the WSO2 BAM, simply hit Ctrl-C in the command window.
Copy the URL of the BAM management console from the terminal output, which will be displayed as follows, and access it using your web browser.
Mgt Console URL: https://10.100.5.72:9444/carbon/

Creating HTTP input event adaptor

Follow the steps below to create a HTTP input event adaptor to send events to BAM.

Log in to BAM Management Console using admin/admin as username and password.
Click Configure, and then click Input Event Adaptors.
Click Add Input Event Adaptor to create a new adaptor.
Enter HttpEventAdaptor for Event Adaptor Name, and select http for Event Adaptor Type as follows.
Click Add Event Adaptor.

Adding the event stream

Follow the steps below to add an event stream in WSO2 BAM.

Click Main, and then click Event Streams.
Click Add Event Stream.
Enter HttpEventStream for Event Stream Name and 1.0.0 for Event Stream Version. Also add a meaningful description and a nick name as follows.

Adding stream attributes

Follow the steps below to add the attributes in adding the event stream as follows.

Under Meta Data Attributes, enter Attribute Name as server, and select Attribute Type as string. Then click Add, to add the attribute.
Under Payload Data Attributes, enter Attribute Name as request-path, and select Attribute Type as string. Then click Add.
Under Payload Data Attributes, also enter Attribute Name as time, and select Attribute Type as string. Then click Add.
Click Add Event Stream, and click OK in the pop-up message.
Click Create Later in the next pop-up message in which, Default WSO2Event Builder is selected by default.

Adding the event builder

Follow the steps below to add the event builder.

Click Main, and then click Event Streams.
In the Available Event Streams list, click In-Flows of this newly added HttpEventStream stream.
Click Receive from External Event Stream(via Event Builder).
Enter HttpEventBuilder for Event Builder Name, select HttpEventAdaptor for Input Event Adaptor Name, and enter requestInfo as a Topic as follows.
Click Add Event Builder.

Publishing data to WSO2 BAM

Now WSO2 BAM server is ready to accept HTTP events. It expects a payload as shown below when events are being published to it.

When running this sample, use the same format as shown in the below payload configuration, for the date and time specified as the value of the time parameter.

<events>
    <event>
        <metaData>
            <server>localhost</server>
        </metaData>
        <payloadData>
            <request-path>/api/userinfo/1</request-path>
            <time>2014-09-18 12:24</time>
        </payloadData>
    </event>
</events>

Create a file named event.xml with a payload as above, and use a sample cURL command as follows to send the above payload to WSO2 BAM. If you have set a port offset, change the port of the server URL in the below command accordingly.

curl -v http://localhost:9763/endpoints/HttpEventAdaptor/requestInfo -d @event.xml

You can also use this script to automate and simplify the task of publishing more events to WSO2 BAM using the above payload configuration. If you have set a port offset, change the port of the server URL in the CURL command in the script accordingly.

Summarizing the published data

This section explains how to summarize and persist the published data to a RDBMS.

Click List under Analytics in the Main menu.
Click Add Script to create the Hive query.
To create Hive virtual table corresponding to the actual data that reside in the Cassandra server, execute the below Hive script.
```
CREATE EXTERNAL TABLE IF NOT EXISTS mappingRequestInfoTable (key STRING, request_path STRING, time STRING, server STRING, payload_timestamp BIGINT) STORED BY
'org.apache.hadoop.hive.cassandra.CassandraStorageHandler' WITH SERDEPROPERTIES (
"wso2.carbon.datasource.name" = "WSO2BAM_CASSANDRA_DATASOURCE",
"cassandra.cf.name" = "HttpEventStream",
"cassandra.columns.mapping" = ":key, payload_request-path, payload_time, meta_server, Timestamp");
```
The mappingRequestInfoTable, which is created by executing the above Hive script, is the Hive representation of the Cassandra column family values of HttpEventStream. Apart from data sent by you, WSO2 BAM server will add additional columns like Timestamp, row key etc. The WSO2BAM_CASSANDRA_EVENT_SOURCE datasource is used to connect to Cassandra server, which is mentioned in bam-datasources.xml configuration file.
Click Execute, to check if the query executes successfully.
Click Save, and click No in the pop-up message.

Using a time-based summarization

Data will be summarized and separately saved on three different RDBMS as daily, hourly and minute-wise. Follow the steps below to summarize the published events by executing Hive scripts.

Execute the below query, to create a new RDBMS table to summarize data minute-wise and to store them in a separate RDBMS table.

CREATE EXTERNAL TABLE IF NOT EXISTS requestInfoStatsPerMinute (server STRING, request_path STRING, year SMALLINT, month SMALLINT, day SMALLINT, hour SMALLINT, minute SMALLINT, total_hit INT)
STORED BY 'org.wso2.carbon.hadoop.hive.jdbc.storage.JDBCStorageHandler' TBLPROPERTIES (
'wso2.carbon.datasource.name'='WSO2BAM_DATASOURCE',
'hive.jdbc.update.on.duplicate' = 'true',
'hive.jdbc.primary.key.fields' = 'server,request_path,year,month,day,hour,minute',
'hive.jdbc.table.create.query' = 'CREATE TABLE REQUEST_INFO_SUMMARY_PER_MINUTE (server VARCHAR(50), request_path VARCHAR(150), year SMALLINT, month SMALLINT, day SMALLINT, hour SMALLINT, minute SMALLINT, total_hit INT)');

Execute the below query, to get data from Cassandra column family and then group the data (by server, request_path, year, month, day, hour and minute), and to insert those values into the REQUEST_INFO_SUMMARY_PER_MINUTE table.

insert overwrite table requestInfoStatsPerMinute 
select server, request_path, substr(time,1,4) as year, substr(time,6,2) as month,substr(time,9,2) as day, substr(time,12,2) as hour, substr(time,15,2) as minute, count(*) as total_hit
from mappingRequestInfoTable 
group by server,request_path,substr(time,1,4), substr(time,6,2),substr(time,9,2),substr(time,12,2),substr(time,15,2);

Execute the below query, to create a new virtual Hive table to get data from the REQUEST_INFO_SUMMARY_PER_MINUTE table, which is already summarized.

In summarizing data based on hourly usage,you don’t need to do get data again from Cassandra column family since now you already have summarizing data based on minutes,

CREATE EXTERNAL TABLE IF NOT EXISTS requestInfoStatsPerMinuteDataFetcher (server STRING, request_path STRING, year SMALLINT, month SMALLINT, day SMALLINT, hour SMALLINT, minute SMALLINT, total_hit INT)
STORED BY 'org.wso2.carbon.hadoop.hive.jdbc.storage.JDBCStorageHandler' TBLPROPERTIES ( 
'wso2.carbon.datasource.name'='WSO2BAM_DATASOURCE',
'mapred.jdbc.input.table.name' = 'REQUEST_INFO_SUMMARY_PER_MINUTE');

Execute the below query, to create hourly based summarized data table in the RDBMS

CREATE EXTERNAL TABLE IF NOT EXISTS requestInfoStatsPerHour (server STRING, request_path STRING, year SMALLINT, month SMALLINT, day SMALLINT, hour SMALLINT, total_hit INT)
STORED BY 'org.wso2.carbon.hadoop.hive.jdbc.storage.JDBCStorageHandler' TBLPROPERTIES ( 
'wso2.carbon.datasource.name'='WSO2BAM_DATASOURCE',
'hive.jdbc.update.on.duplicate' = 'true',
'hive.jdbc.primary.key.fields' = 'server,request_path,year,month,day,hour',
'hive.jdbc.table.create.query' = 'CREATE TABLE REQUEST_INFO_SUMMARY_PER_HOUR ( server VARCHAR(50), request_path VARCHAR(150), year SMALLINT, month SMALLINT, day SMALLINT, hour SMALLINT, total_hit INT)' );

Execute the below query, to get data from minute-wise summarized table and to insert into the hourly based table after processing the data.

insert overwrite table requestInfoStatsPerHour 
select server, request_path, year, month, day, hour, sum(total_hit) as total_hit
from requestInfoStatsPerMinuteDataFetcher 
group by server,request_path,year,month,day,hour;

CREATE EXTERNAL TABLE IF NOT EXISTS requestInfoStatsPerHourDataFetcher (server STRING, request_path STRING, year SMALLINT, month SMALLINT, day SMALLINT, hour SMALLINT,  total_hit INT)
STORED BY 'org.wso2.carbon.hadoop.hive.jdbc.storage.JDBCStorageHandler' TBLPROPERTIES ( 
'wso2.carbon.datasource.name'='WSO2BAM_DATASOURCE',
'mapred.jdbc.input.table.name' = 'REQUEST_INFO_SUMMARY_PER_HOUR');
CREATE EXTERNAL TABLE IF NOT EXISTS requestInfoStatsPerDay (server STRING, request_path STRING, year SMALLINT, month SMALLINT, day SMALLINT, total_hit INT)
STORED BY 'org.wso2.carbon.hadoop.hive.jdbc.storage.JDBCStorageHandler' TBLPROPERTIES ( 
'wso2.carbon.datasource.name'='WSO2BAM_DATASOURCE',
'hive.jdbc.update.on.duplicate' = 'true',
'hive.jdbc.primary.key.fields' = 'server,request_path,year,month,day',
'hive.jdbc.table.create.query' = 'CREATE TABLE REQUEST_INFO_SUMMARY_PER_DAY ( server VARCHAR(50), request_path VARCHAR(150), year SMALLINT, month SMALLINT, day SMALLINT, total_hit INT)' );

insert overwrite table requestInfoStatsPerDay 
select server, request_path, year, month, day, sum(total_hit) as total_hit
from requestInfoStatsPerHourDataFetcher 
group by server,request_path,year,month,day;

You can add another two virtual Hive tables (by changing the configurations of the requestInfoStatsPerDay and requestInfoStatsPerHourDataFetcher tables), to summarize data based on months, using the above instructions.

Using a direct mapping

Optionally, you can summarize data based on a one to one mapping being synchronized with publishing. Follow the steps below to persist data based on the time, which they were send to WSO2 BAM server.

To create a new RDBMS table named REQUEST_INFO_TABLE, and a corresponding virtual Hive table named targetRequestInfoTable to represent the REQUEST_INFO_TABLE, execute the below Hive script.

The WSO2BAM_DATASOURCE datasource is used to connect to RDBMS.

CREATE EXTERNAL TABLE IF NOT EXISTS targetRequestInfoTable (key STRING, request_path STRING, time STRING, server STRING)
STORED BY 'org.wso2.carbon.hadoop.hive.jdbc.storage.JDBCStorageHandler' TBLPROPERTIES (
'wso2.carbon.datasource.name'='WSO2BAM_DATASOURCE',
'hive.jdbc.update.on.duplicate' = 'true',
'hive.jdbc.primary.key.fields' = 'key,server,request_path,time',
'hive.jdbc.table.create.query' = 'CREATE TABLE REQUEST_INFO_TABLE ( key VARCHAR(100) NOT NULL, request_path VARCHAR(150), time VARCHAR(30), server VARCHAR(50))');

Execute the below query to fetch data from Cassandra column family, and to insert those into the RDBMS table.

insert overwrite table targetRequestInfoTable select key, request_path, time, server from mappingRequestInfoTable;

Creating a dashboard

Follow the steps below to create a dashboard to display the summarized data using the Gadget Generation Tool.

For more information on the Gadget Generation Tool, see Gadget Generation Tool.

Log in to BAM management console, and click Tools.
Click Gadget Gen Tool in the Cassadra Explorer menu.
Enter the details of the JDBC datasource, and click Next.

Get the values for the fields of the below screen from the elements of the WSO2BAM_DATASOURCE in <BAM_HOME>/repository/conf/datasources/bam-datasources.xml file.
Enter the SQL query, which will fetch the required data for your gadget, and click Next.
Pick the UI element for visualizing your data, and click Next.
Enter the details of the gadget, and click Generate!.
Copy the URL of the created gadget, which is displayed in the screen.
Click Go to Dashboard, and click Add Gadget.
Click Enter Gadget Location, and enter the URL you copied as shown below.
Click Add Gadget.

Now the gadget will be displayed in the dashboard as shown in the screen below.

Configuring data purging

When your system runs for a long time, data grows gradually in Cassandra. Therefore, you need to configure data purging tasks in order to maintain Cassandra server growth. You can schedule a purging task either for data on a particular date range or for data that is older than a particular time period. For more information on configuring data purging tasks, see Purging Data .

In this guide, purging data will not effect your final results. But, if you are using a monthly-wise summarization, then you should be cautious in selecting dates, because, by using that method you can only configure to keep data collected within the last n number of days (for example: last 90 days). Therefore, there are possibilities of getting inaccurate aggregated results for monthly stats, because some of the data of the month being considered may have been deleted.

However, you can use BAM Management Console to purge data manually by specifying proper data ranges. Hereby, start date has to be a first day of a month, and the end date has to be a last day of a month (cannot be the current month). For example: 01/02/2014 to 30/04/2014.

Follow the steps below to schedule a data deletion task as shown below.

data deletion

Log in to BAM management console, and click Tools.
Click Add in the Archive/Purge Data menu.
Enter HttpEventStream for Stream Name and 1.0.0 for Version.
Click Date Range to manually archive data between a specific date range.
In the From field, select the start date of the date range to be specified. For example, if you are running the sample today, select yesterday's date as the start date.
In the To field, select the end date of the date range to be specified. For example, if you are running the sample today, select tomorrow's date as the end date.
Select Deletion, and click Submit.