You can use WSO2 BAM to analyze and display published API invocation calls sent to it from external API clients. This guide walks you through the following sections to simulate this use case. In this guide, a CURL command simulates sending API invocation calls as events to BAM 2.5.0 using HTTP input adapter. Also, it describes how the published events are analyzed using Hive queries to summarize them based on time on a per second, per minute and per hour basis, or based on a one to one mapping, which synchronizes publishing. Statistics of the summarized data are displayed using a WSO2 BAM dashboard.
...
Follow the steps below to create a HTTP input event adaptor to send events to BAM.
- Log in to BAM Management Console using admin/admin as username and password.
- Click Configure, and then click Input Event Adaptors .
- Click Add Input Event Adaptor to create a new adaptor.
Enter HttpEventAdaptor for Event Adaptor Name, and select http for Event Adaptor Type as follows.
- Click Add Event Adaptor.
...
- Click Main, and then click Event Streams.
- Click Add Event Stream.
Enter HttpEventStream for Event Stream Name and 1.0.0 for Event Stream Version. Also add a meaningful description and a nick name as follows.
...
- Under Meta Data Attributes, enter enter Attribute Name as serverserver, and select Attribute Type as string. Then click Add, to add the attribute.
- Under Payload Data Attributes, enter enter Attribute Name as requestasrequest-path,
- Under Payload Data Attributes, also enter Attribute Name as time as time,
- Click Add Event Stream, and click OK in the pop-up message.
Adding the event builder
Follow the steps below to add the event builder.
- Click Main, and then click Event Streams,
- In the Available Event Streams list, click click In-Flows of this newly added HttpEventStream stream.
Click Receive from External Event Stream(via Event Builder).
Enter HttpEventBuilder
Click Add Event Builder.
Publishing data to WSO2 BAM
Now WSO2 BAM server is ready to accept HTTP events. It expects a payload as shown below when below when events are being published to it.
Info |
---|
When running this sample, use the same format as shown in the below payload configuration, for the date and time specified as the value of the |
Code Block | ||
---|---|---|
| ||
<events> <event> <metaData> <server>localhost</server> </metaData> <payloadData> <request-path>/api/userinfo/1</request-path> <time>2014-09-18 12:24</time> </payloadData> </event> </events> |
Create a file named event.xml
with a payload as above, and use a sample curl command as follows to send the above payload to WSO2 BAM. If If you have set a port offset, change the port of the server URL in the below command accordingly.
curl -v http://localhost:9763/endpoints/HttpEventAdaptor/requestInfo -d @event.xml
Note |
---|
You can also use this script to automate and simplify the task of publishing more events to WSO2 BAM using the above payload configuration. If you have set a port offset, change the port of the server URL in the CURL command in the script accordingly. |
Summarizing the published data
This section explains how to summarize and persist the published data to a RDBMS.
- Click List under Analytics in the Main menu.
- Click Add Script to create the Hive query.
To create Hive virtual table corresponding to the actual data that reside in the Cassandra server, execute the below Hive script.
Code Block language sql CREATE EXTERNAL TABLE IF NOT EXISTS mappingRequestInfoTable (key STRING, request_path STRING, time STRING, server STRING, payload_timestamp BIGINT) STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler' WITH SERDEPROPERTIES ( "wso2.carbon.datasource.name" = "WSO2BAM_CASSANDRA_DATASOURCE", "cassandra.cf.name" = "HttpEventStream", "cassandra.columns.mapping" = ":key, payload_request-path, payload_time, meta_server, Timestamp");
Info The
mappingRequestInfoTable,
which is created by executing the above Hive script, is the Hive representation of the Cassandra column family values ofHttpEventStream
. Apart from data sent by you, WSO2 BAM server will add additional columns likeTimestamp
,row key
etc. TheWSO2BAM_CASSANDRA_EVENT_SOURCE
datasource is used to connect to Cassandra server, which is mentioned inbam-datasources.xml
configuration file.Click Execute, to check if the query executes successfully.
- Click Save, and click No in the pop-up message.
Using a time-based summarization
Data will be summarized and separately saved on three different RDBMS as daily, hourly and minute-wise. Follow the steps below to summarize the published events by executing Hive scripts.
Execute Execute the below below query, to to create a new RDBMS table to summarize data minute-wise and to store them in a separate RDBMS table.
Code Block language sql CREATE EXTERNAL TABLE IF NOT EXISTS requestInfoStatsPerMinute (server STRING, request_path STRING, year SMALLINT, month SMALLINT, day SMALLINT, hour SMALLINT, minute SMALLINT, total_hit INT) STORED BY 'org.wso2.carbon.hadoop.hive.jdbc.storage.JDBCStorageHandler' TBLPROPERTIES ( 'wso2.carbon.datasource.name'='WSO2BAM_DATASOURCE', 'hive.jdbc.update.on.duplicate' = 'true', 'hive.jdbc.primary.key.fields' = 'server,request_path,year,month,day,hour,minute', 'hive.jdbc.table.create.query' = 'CREATE TABLE REQUEST_INFO_SUMMARY_PER_MINUTE (server VARCHAR(50), request_path VARCHAR(150), year SMALLINT, month SMALLINT, day SMALLINT, hour SMALLINT, minute SMALLINT, total_hit INT)');
Execute the below the below query, to get to get data from Cassandra column family and then group the data (by server, request_path, year, month, day, hour and minute), and to insert those values into the
REQUEST_INFO_SUMMARY_PER_MINUTE
table.Code Block language sql insert overwrite table requestInfoStatsPerMinute select server, request_path, substr(time,1,4) as year, substr(time,6,2) as month,substr(time,9,2) as day, substr(time,12,2) as hour, substr(time,15,2) as minute, count(*) as total_hit from mappingRequestInfoTable group by server,request_path,substr(time,1,4), substr(time,6,2),substr(time,9,2),substr(time,12,2),substr(time,15,2);
Execute the below below query, to create to create a new virtual Hive table to get Hive table to get data from the
REQUEST_INFO_SUMMARY_PER_MINUTE
table, which is already is already summarized.Info In summarizing data based on hourly usage,you don’t need to do get data again from Cassandra column family since since now you already have summarizing data based on minutes,
Code Block language sql CREATE EXTERNAL TABLE IF NOT EXISTS requestInfoStatsPerMinuteDataFetcher (server STRING, request_path STRING, year SMALLINT, month SMALLINT, day SMALLINT, hour SMALLINT, minute SMALLINT, total_hit INT) STORED BY 'org.wso2.carbon.hadoop.hive.jdbc.storage.JDBCStorageHandler' TBLPROPERTIES ( 'wso2.carbon.datasource.name'='WSO2BAM_DATASOURCE', 'mapred.jdbc.input.table.name' = 'REQUEST_INFO_SUMMARY_PER_MINUTE');
Execute the below the below query, to create to create hourly based summarized data table in the RDBMS
Code Block language sql CREATE EXTERNAL TABLE IF NOT EXISTS requestInfoStatsPerHour (server STRING, request_path STRING, year SMALLINT, month SMALLINT, day SMALLINT, hour SMALLINT, total_hit INT) STORED BY 'org.wso2.carbon.hadoop.hive.jdbc.storage.JDBCStorageHandler' TBLPROPERTIES ( 'wso2.carbon.datasource.name'='WSO2BAM_DATASOURCE', 'hive.jdbc.update.on.duplicate' = 'true', 'hive.jdbc.primary.key.fields' = 'server,request_path,year,month,day,hour', 'hive.jdbc.table.create.query' = 'CREATE TABLE REQUEST_INFO_SUMMARY_PER_HOUR ( server VARCHAR(50), request_path VARCHAR(150), year SMALLINT, month SMALLINT, day SMALLINT, hour SMALLINT, total_hit INT)' );
Execute the below the below query, to get to get data from minute-wise summarized table and to insert into the hourly based table after processing the data.
Code Block language sql insert overwrite table requestInfoStatsPerHour select server, request_path, year, month, day, hour, sum(total_hit) as total_hit from requestInfoStatsPerMinuteDataFetcher group by server,request_path,year,month,day,hour; CREATE EXTERNAL TABLE IF NOT EXISTS requestInfoStatsPerHourDataFetcher (server STRING, request_path STRING, year SMALLINT, month SMALLINT, day SMALLINT, hour SMALLINT, total_hit INT) STORED BY 'org.wso2.carbon.hadoop.hive.jdbc.storage.JDBCStorageHandler' TBLPROPERTIES ( 'wso2.carbon.datasource.name'='WSO2BAM_DATASOURCE', 'mapred.jdbc.input.table.name' = 'REQUEST_INFO_SUMMARY_PER_HOUR'); CREATE EXTERNAL TABLE IF NOT EXISTS requestInfoStatsPerDay (server STRING, request_path STRING, year SMALLINT, month SMALLINT, day SMALLINT, total_hit INT) STORED BY 'org.wso2.carbon.hadoop.hive.jdbc.storage.JDBCStorageHandler' TBLPROPERTIES ( 'wso2.carbon.datasource.name'='WSO2BAM_DATASOURCE', 'hive.jdbc.update.on.duplicate' = 'true', 'hive.jdbc.primary.key.fields' = 'server,request_path,year,month,day', 'hive.jdbc.table.create.query' = 'CREATE TABLE REQUEST_INFO_SUMMARY_PER_DAY ( server VARCHAR(50), request_path VARCHAR(150), year SMALLINT, month SMALLINT, day SMALLINT, total_hit INT)' ); insert overwrite table requestInfoStatsPerDay select server, request_path, year, month, day, sum(total_hit) as total_hit from requestInfoStatsPerHourDataFetcher group by server,request_path,year,month,day;
You can add another two virtual Hive tables (by changing the configurations of the the
requestInfoStatsPerDay
andrequestInfoStatsPerHourDataFetcher
tables), to summarize data based on months, using the above instructions.
Using a direct mapping
Optionally, you can summarize data based on a one to one mapping being synchronized with publishing. Follow the steps below to persist data based on the time, which they were send to WSO2 BAM server.
To create a new RDBMS table named
REQUEST_INFO_TABLE
, and a corresponding virtual Hive table namedtargetRequestInfoTable
to represent theREQUEST_INFO_TABLE, e
xecute the below Hive script.Info The
WSO2BAM_DATASOURCE
datasource is used to connect to RDBMS.Code Block language sql CREATE EXTERNAL TABLE IF NOT EXISTS targetRequestInfoTable (key STRING, request_path STRING, time STRING, server STRING) STORED BY 'org.wso2.carbon.hadoop.hive.jdbc.storage.JDBCStorageHandler' TBLPROPERTIES ( 'wso2.carbon.datasource.name'='WSO2BAM_DATASOURCE', 'hive.jdbc.update.on.duplicate' = 'true', 'hive.jdbc.primary.key.fields' = 'key,server,request_path,time', 'hive.jdbc.table.create.query' = 'CREATE TABLE REQUEST_INFO_TABLE ( key VARCHAR(100) NOT NULL, request_path VARCHAR(150), time VARCHAR(30), server VARCHAR(50))');
Execute the below the below query to fetch data from Cassandra column family, and to insert those into the RDBMS table.
Code Block language sql insert overwrite table targetRequestInfoTable select key, request_path, time, server from mappingRequestInfoTable;
Creating a dashboard
Follow the steps below to create a dashboard to display the summarized data using the Gadget Generation Tool.
...
- Log in to BAM management console, and click Tools.
- Click Gadget Gen Tool in the Cassadra Explorer menu.
Enter the details of the JDBC datasource, and click Next.
Info Get the values for the fields of the below screen from the elements of the WSO2BAM_DATASOURCE in
<BAM_HOME>/repository/conf/datasources/bam-datasources.xml
file.- Enter the SQL query, which will fetch the required data for your gadget, and click Next.
- Pick the UI element for visualizing your data, and click Next.
- Enter the details of the gadget, and click Generate!.
- Copy the URL of the created gadget, which is displayed in the screen.
- Click Go to Dashboard, and click Add Gadget.
- Click Enter Gadget Location, and enter the URL you copied as shown below.
- Click Add Gadget.
Now the gadget will be displayed in the dashboard as shown in the screen below.
Configuring data purging
When your system runs for a long time, data grows gradually in Cassandra. Therefore, you need to configure data purging tasks in order to maintain Cassandra server growth. You can schedule a purging task either for data on a particular date range or for data that is older than a particular time period. For For more information on configuring data purging tasks, see Purging Data .
Info |
---|
In this guide, purging data will not effect your final results. But, if you are using a monthly-wise summarization, then you should be cautious in selecting dates, because, by using that method you can only configure to keep data collected within the last However, you can use BAM Management Console to purge data manually by specifying proper data ranges. Hereby, start date has to be a first day of a month, and the end date has to be a last day of a month (cannot be the current month). For example: 01/02/2014 to 30/04/2014. |
Follow the steps below to schedule to schedule a data deletion task as shown below.
- Log in to BAM management console, and click click Tools.
- Click Add in the Archive/Purge Data menu.
- Enter HttpEventStream for Stream Name and 1.0.0 for Version.
- Click Date Range to manually archive data between a specific date range.
- In the From field, select the start date of the date range to be specified. For example, if you are running the sample today, select yesterday's date as the start date.
- In the To field, select the end date of the date range to be specified. For example, if you are running the sample today, select tomorrow's date as the end date.
- Select Deletion, and click Submit.