Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

You can use WSO2 BAM to analyze and display published API invocation calls sent to it from external API clients. This guide walks you through the following sections to simulate this use case. In this guide, a CURL command simulates sending API invocation calls as events to BAM 2.5.0 using HTTP input adapter. Also, it describes how the published events are analyzed using Hive queries to summarize them based on time on a per second, per minute and per hour basis, or based on a one to one mapping, which synchronizes publishing. Statistics of the summarized data are displayed using a WSO2 BAM dashboard.

...

Follow the steps below to create a HTTP input event adaptor to send events to BAM.

  1. Log in to BAM Management Console using admin/admin as username and password.
  2. Click Configure, and then click Input Event Adaptors .
  3. Click Add Input Event Adaptor to create a new adaptor.
  4. Enter HttpEventAdaptor for Event Adaptor Name, and select http for Event Adaptor Type as follows.

  5. Click Add Event Adaptor.

...

  1. Click Main, and then click  Event Streams.
  2. Click Add Event Stream.
  3. Enter HttpEventStream for Event Stream Name and 1.0.0 for Event Stream Version. Also add a meaningful description and a nick name as follows.

...

  1. Under Meta Data Attributes, enter enter Attribute Name as  serverserver,  and select Attribute Type as string.  Then click Add, to add the attribute.
  2. Under Payload Data Attributes, enter enter Attribute Name as requestasrequest-path,  and select Attribute Type as string. Then click Add.
  3. Under Payload Data Attributes, also enter Attribute Name as time as time, and select Attribute Type as string. Then click Add.
  4. Click Add Event Stream, and click OK in the pop-up message.

Adding the event builder

Follow the steps below to add the event builder.

  1. Click  Main, and then click Event Streams, 
  2. In the Available Event Streams list, click click In-Flows of this newly added HttpEventStream stream.
  3. Click Receive from External Event Stream(via Event Builder).

  4. Enter HttpEventBuilder for Event Builder Name,  select HttpEventAdaptor for Input Event Adaptor Name, and  enter requestInfo as a Topic as follows.

    Image Modified

  5. Click Add Event Builder.

Publishing data to WSO2 BAM

Now WSO2 BAM server is ready to accept HTTP events. It expects a payload as shown below when below when events are being published to it. 

Info

When running this sample, use the same format as shown in the below payload configuration, for the date and time specified as the value of the time parameter.

Code Block
languagexml
<events>
    <event>
        <metaData>
            <server>localhost</server>
        </metaData>
        <payloadData>
            <request-path>/api/userinfo/1</request-path>
            <time>2014-09-18 12:24</time>
        </payloadData>
    </event>
</events>

Create a file named event.xml with a payload as above, and use a sample curl command as follows to send the above payload to WSO2 BAM.  If  If you have set a port offset, change the port of the server URL in the below command accordingly.

curl -v  http://localhost:9763/endpoints/HttpEventAdaptor/requestInfo -d @event.xml  

Note

You can also use this script to automate and simplify the task of publishing more events to WSO2 BAM using the above payload configuration.   If you have set a port offset, change the port of the server URL in the CURL command in the script accordingly.

Summarizing the published data

This section explains how to summarize and persist the published data to a RDBMS. 

  1. Click List under Analytics in the Main menu.
  2. Click Add Script to create the Hive query.
  3. To create Hive virtual table corresponding to the actual data that reside in the Cassandra server, execute the below Hive script.

    Code Block
    languagesql
    CREATE EXTERNAL TABLE IF NOT EXISTS mappingRequestInfoTable (key STRING, request_path STRING, time STRING, server STRING, payload_timestamp BIGINT) STORED BY
    'org.apache.hadoop.hive.cassandra.CassandraStorageHandler' WITH SERDEPROPERTIES (
    "wso2.carbon.datasource.name" = "WSO2BAM_CASSANDRA_DATASOURCE",
    "cassandra.cf.name" = "HttpEventStream",
    "cassandra.columns.mapping" = ":key, payload_request-path, payload_time, meta_server, Timestamp");
    Info

    The mappingRequestInfoTable, which is created by executing the above Hive script, is the Hive representation of the Cassandra column family values of HttpEventStream. Apart from data sent by you, WSO2 BAM server will add additional columns like Timestamp, row key etc. The WSO2BAM_CASSANDRA_EVENT_SOURCE datasource is used to connect to Cassandra server, which is mentioned in bam-datasources.xml configuration file.

  4. Click Execute, to check if the query executes successfully.

  5. Click Save, and click No in the pop-up message.

Using a time-based summarization

Data will be summarized and separately saved on three different RDBMS as daily, hourly and minute-wise. Follow the steps below to summarize the published events by executing Hive scripts.

  1.  Execute Execute the  below below query, to to create a new RDBMS table to summarize data minute-wise and to store them in a separate RDBMS table.

    Code Block
    languagesql
    CREATE EXTERNAL TABLE IF NOT EXISTS requestInfoStatsPerMinute (server STRING, request_path STRING, year SMALLINT, month SMALLINT, day SMALLINT, hour SMALLINT, minute SMALLINT, total_hit INT)
    STORED BY 'org.wso2.carbon.hadoop.hive.jdbc.storage.JDBCStorageHandler' TBLPROPERTIES (
    'wso2.carbon.datasource.name'='WSO2BAM_DATASOURCE',
    'hive.jdbc.update.on.duplicate' = 'true',
    'hive.jdbc.primary.key.fields' = 'server,request_path,year,month,day,hour,minute',
    'hive.jdbc.table.create.query' = 'CREATE TABLE REQUEST_INFO_SUMMARY_PER_MINUTE (server VARCHAR(50), request_path VARCHAR(150), year SMALLINT, month SMALLINT, day SMALLINT, hour SMALLINT, minute SMALLINT, total_hit INT)');
  2. Execute the below the below query, to get to get data from Cassandra column family and then group the data (by server, request_path, year, month, day, hour and minute), and to insert those values into the REQUEST_INFO_SUMMARY_PER_MINUTE table.

    Code Block
    languagesql
    insert overwrite table requestInfoStatsPerMinute 
    select server, request_path, substr(time,1,4) as year, substr(time,6,2) as month,substr(time,9,2) as day, substr(time,12,2) as hour, substr(time,15,2) as minute, count(*) as total_hit
    from mappingRequestInfoTable 
    group by server,request_path,substr(time,1,4), substr(time,6,2),substr(time,9,2),substr(time,12,2),substr(time,15,2);
  3. Execute the  below below query, to create to create a new virtual Hive table to get Hive table to get data from the REQUEST_INFO_SUMMARY_PER_MINUTE table, which is already is already summarized.

    Info

    In summarizing data based on hourly usage,you don’t need to do get data again from Cassandra column family since since now you already have summarizing data based on minutes,

    Code Block
    languagesql
    CREATE EXTERNAL TABLE IF NOT EXISTS requestInfoStatsPerMinuteDataFetcher (server STRING, request_path STRING, year SMALLINT, month SMALLINT, day SMALLINT, hour SMALLINT, minute SMALLINT, total_hit INT)
    STORED BY 'org.wso2.carbon.hadoop.hive.jdbc.storage.JDBCStorageHandler' TBLPROPERTIES ( 
    'wso2.carbon.datasource.name'='WSO2BAM_DATASOURCE',
    'mapred.jdbc.input.table.name' = 'REQUEST_INFO_SUMMARY_PER_MINUTE');
  4. Execute the below the below query, to create to create hourly based summarized data table in the RDBMS  

    Code Block
    languagesql
    CREATE EXTERNAL TABLE IF NOT EXISTS requestInfoStatsPerHour (server STRING, request_path STRING, year SMALLINT, month SMALLINT, day SMALLINT, hour SMALLINT, total_hit INT)
    STORED BY 'org.wso2.carbon.hadoop.hive.jdbc.storage.JDBCStorageHandler' TBLPROPERTIES ( 
    'wso2.carbon.datasource.name'='WSO2BAM_DATASOURCE',
    'hive.jdbc.update.on.duplicate' = 'true',
    'hive.jdbc.primary.key.fields' = 'server,request_path,year,month,day,hour',
    'hive.jdbc.table.create.query' = 'CREATE TABLE REQUEST_INFO_SUMMARY_PER_HOUR ( server VARCHAR(50), request_path VARCHAR(150), year SMALLINT, month SMALLINT, day SMALLINT, hour SMALLINT, total_hit INT)' );
  5. Execute the below the below query, to get to get data from minute-wise summarized table and to insert into the hourly based table after processing the data.

     

    Code Block
    languagesql
    insert overwrite table requestInfoStatsPerHour 
    select server, request_path, year, month, day, hour, sum(total_hit) as total_hit
    from requestInfoStatsPerMinuteDataFetcher 
    group by server,request_path,year,month,day,hour;
    
    CREATE EXTERNAL TABLE IF NOT EXISTS requestInfoStatsPerHourDataFetcher (server STRING, request_path STRING, year SMALLINT, month SMALLINT, day SMALLINT, hour SMALLINT,  total_hit INT)
    STORED BY 'org.wso2.carbon.hadoop.hive.jdbc.storage.JDBCStorageHandler' TBLPROPERTIES ( 
    'wso2.carbon.datasource.name'='WSO2BAM_DATASOURCE',
    'mapred.jdbc.input.table.name' = 'REQUEST_INFO_SUMMARY_PER_HOUR');
    CREATE EXTERNAL TABLE IF NOT EXISTS requestInfoStatsPerDay (server STRING, request_path STRING, year SMALLINT, month SMALLINT, day SMALLINT, total_hit INT)
    STORED BY 'org.wso2.carbon.hadoop.hive.jdbc.storage.JDBCStorageHandler' TBLPROPERTIES ( 
    'wso2.carbon.datasource.name'='WSO2BAM_DATASOURCE',
    'hive.jdbc.update.on.duplicate' = 'true',
    'hive.jdbc.primary.key.fields' = 'server,request_path,year,month,day',
    'hive.jdbc.table.create.query' = 'CREATE TABLE REQUEST_INFO_SUMMARY_PER_DAY ( server VARCHAR(50), request_path VARCHAR(150), year SMALLINT, month SMALLINT, day SMALLINT, total_hit INT)' );
    
    insert overwrite table requestInfoStatsPerDay 
    select server, request_path, year, month, day, sum(total_hit) as total_hit
    from requestInfoStatsPerHourDataFetcher 
    group by server,request_path,year,month,day;
  6. You can add another two virtual Hive tables (by changing the configurations of the the requestInfoStatsPerDay and requestInfoStatsPerHourDataFetcher tables),   to summarize data based on months, using the above instructions.

Using a direct mapping 

Optionally, you can summarize data based on a one to one mapping being synchronized with publishing. Follow the steps below to persist data based on the time, which they were send to WSO2 BAM server. 

  1. To create a new RDBMS table named REQUEST_INFO_TABLE, and a corresponding virtual Hive table named targetRequestInfoTable to represent the REQUEST_INFO_TABLE, execute the below Hive script.

    Info

    The WSO2BAM_DATASOURCE datasource is used to connect to RDBMS.

    Code Block
    languagesql
    CREATE EXTERNAL TABLE IF NOT EXISTS targetRequestInfoTable (key STRING, request_path STRING, time STRING, server STRING)
    STORED BY 'org.wso2.carbon.hadoop.hive.jdbc.storage.JDBCStorageHandler' TBLPROPERTIES (
    'wso2.carbon.datasource.name'='WSO2BAM_DATASOURCE',
    'hive.jdbc.update.on.duplicate' = 'true',
    'hive.jdbc.primary.key.fields' = 'key,server,request_path,time',
    'hive.jdbc.table.create.query' = 'CREATE TABLE REQUEST_INFO_TABLE ( key VARCHAR(100) NOT NULL, request_path VARCHAR(150), time VARCHAR(30), server VARCHAR(50))'); 
  2.  Execute the  below the below query to fetch data from Cassandra column family, and to insert those into the RDBMS table.    

    Code Block
    languagesql
    insert overwrite table targetRequestInfoTable select key, request_path, time, server from mappingRequestInfoTable;

Creating a dashboard

Follow the steps below to create a dashboard to display the summarized data using the Gadget Generation Tool. 

...

  1. Log in to BAM management console, and click Tools.
  2. Click Gadget Gen Tool in the Cassadra Explorer menu.
  3. Enter the details of the JDBC datasource, and click Next

    Info

    Get the values for the fields of the below screen from the elements of the WSO2BAM_DATASOURCE in <BAM_HOME>/repository/conf/datasources/bam-datasources.xml file.

    JDBC Datasource Details

  4. Enter the SQL query, which will fetch the required data for your gadget, and click Next.
    The SQL Query
  5. Pick the UI element for visualizing your data, and click Next.
    pick UI element
  6. Enter the details of the gadget, and click Generate!.
    gadget details
  7. Copy the URL of the created gadget, which is displayed in the screen.
    gadget URL
  8. Click Go to Dashboard, and click Add Gadget.
  9. Click Enter Gadget Location, and enter the URL you copied as shown below.
  10. Click Add Gadget.
    Add gadget from URL
    Now the gadget will be displayed in the dashboard as shown in the screen below.
    Gadget on the dashboard

Configuring data purging 

When your system runs for a long time, data grows gradually in Cassandra. Therefore, you need to configure data purging tasks in order to maintain Cassandra server growth.  You can schedule a purging task either for data on a particular date range or for data that is older than a particular time period. For  For more information on configuring data purging tasks, see    Purging Data .

Info

In this guide, purging data will not effect your final results. But, if you are using a monthly-wise summarization, then you should be cautious in selecting dates, because, by using that method you can only configure to keep data collected within the last n number of days (for example: last 90 days). Therefore, there are possibilities of getting inaccurate aggregated results for monthly stats, because some of the data of the month being considered may have been deleted.

However, you can use BAM Management Console to purge data manually by specifying proper data ranges. Hereby, start date has to be a first day of a month, and the end date has to be a last day of a month (cannot be the current month). For example: 01/02/2014 to 30/04/2014.   

Follow the steps below to schedule to schedule a data deletion task as shown below. 

data deletionImage Modified

  1. Log in to BAM management console, and click click Tools.
  2. Click Add in the Archive/Purge Data menu.
  3. Enter HttpEventStream for Stream Name and 1.0.0 for Version.
  4. Click Date Range to manually archive data between a specific date range.
  5. In the From field, select the start date of the date range to be specified. For example, if you are running the sample today, select yesterday's date as the start date.
  6. In the To field, select the end date of the date range to be specified. For example, if you are running the sample today, select tomorrow's date as the end date.
  7. Select Deletion, and click Submit.