Publishing API Runtime Statistics Using WSO2 BAM
In this section, we explain how to set up WSO2 Business Activity Monitor (version 2.5.0 is used here) to collect and analyze runtime statistics from the API Manager. To publish data from the API Manager to BAM, the Thrift protocol is used. Information processed in BAM is stored in a database from which the API Publisher retrieves information before displaying it in the corresponding UI screens.
By default, org.wso2.carbon.apimgt.usage.publisher.APIMgtUsageDataPublisher
is configured to push data events to WSO2 BAM. If you use a product other than WSO2 BAM to collect and analyze runtime statistics, you write a new data publishing agent by extending APIMgtUsageDataPublisher
. Find the API templates inside <APIM_HOME>/repository/resources/api_templates
. When writing a new data publishing agent, make sure the data publishing logic has a minimal impact to API invocation.
Tip: The datasource and database names used here are just examples. They may vary depending on your configurations.
Prerequisites
JDK 1.6.* or 1.7
If you install JDK in the
Program
Files
folder in Windows, avoid the space by using PROGRA~1 in theJAVA_HOME
andPATH
environment variables. Else, the server throws an exception.- Cygwin (http://www.cygwin.com): Required only if you are using Windows. WSO2 BAM analytics framework depends on Apache Hadoop, which requires Cygwin in order to run on Windows. Install at least the basic net (OpenSSH,tcp_wrapper packages) and security related Cygwin packages. After Cygwin installation, update the PATH variable with
C:/cygwin/bin
and restart BAM.
Next prepare BAM to collect and analyze statistics from the API Manager.
Configure WSO2 BAM
- Download WSO2 BAM 2.5.0 from location: http://wso2.com/products/business-activity-monitor.
Apply an offset of 3 to the default BAM port by editing the
<BAM_HOME>/repository/conf/carbon.xml
file.<Offset>3</Offset>
This increments all ports used by the server by 3, which means the BAM server will run on port 9446. Port offset is used to increment the default port by a given value. It avoids possible port conflicts when multiple WSO2 products run on same host.
Install MySQL server and a suitable client like the MySQL Workbench. You can get the instructions manual from http://dev.mysql.com/doc/.
Go to the command-line and issue the following commands to connect to the MySQL server and create a database (e.g.,
TestStatsDB
). This database is used to save the statistical data collected by the BAM. You do not need to create any tables in it.mysql -u <username> -p <password> -h <host_name or IP>; CREATE DATABASE TestStatsDB;
Save the MySQL connector JAR inside both
<APIM_HOME>/repository/components/lib
and<BAM_HOME>/repository/components/lib
folders.Give the datasource definition under the
<datasource>
element in the<BAM_HOME>/repository/conf/datasources/master-datasources.xml
file. The tables are created automatically when the Hive script runs. You just need to create the schema.WSO2AM_STATS_DB
is used to fetch analytical data from the database. The example below connects to a MySQL instance:<datasource> <name>WSO2AM_STATS_DB</name> <description>The datasource used for getting statistics to API Manager</description> <jndiConfig> <name>jdbc/WSO2AM_STATS_DB</name> </jndiConfig> <definition type="RDBMS"> <configuration> <url>jdbc:mysql://localhost:3306/TestStatsDB</url> <username>db_username</username> <password>db_password</password> <driverClassName>com.mysql.jdbc.Driver</driverClassName> <maxActive>50</maxActive> <maxWait>60000</maxWait> <testOnBorrow>true</testOnBorrow> <validationQuery>SELECT 1</validationQuery> <validationInterval>30000</validationInterval> </configuration> </definition> </datasource>
Tip: If you are using BAM 2.4.1, be sure to uncomment the
<thriftDataReceiver><hostName>
element in the<BAM_HOME>/repository/conf/data-bridge/data-bridge-config.xml
file and give the BAM host IP there.Restart BAM server by running
<BAM_HOME>/bin/wso2server.[sh/bat]
.
Configure WSO2 API Manager
- Start the API Manager and log in to its Admin Dashboard Web application (
https://<Server Host>:9443/admin-dashboard
). Click the Configure Analytics menu.
Select the check-box to enable statistical data publishing and do the rest of the configurations using the information given below:
Event Receiver Configurations Set the URL_to
tcp://<BAM_HOST_IP>:7614/
where<BAM_HOST_IP>
is the machine's IP address. Do not use localhost unless you're in a disconnected mode.You can define multiple event receiver groups, each with one or more receivers separated by commas. For an example
tcp://localhost:7612/, tcp://localhost:7613/
. This helps you manage failover. If the BAM server in the first URL fails, the request will be routed to the second one.Event receivers refer to the endpoint to which events are published from the API Gateway. Because you apply an offset to the default BAM port later in this guide, you need to apply the same offset to the default Thrift port. The API Manager then pushes the data to BAM through port 7614, using the Thrift protocol.
Data Analyser Configurations URL and the credentials of the event analyzer node. As this URL is used to deploy the toolbox, make sure that the BAM server is up and running in the given URL. Statistic Summary Datasource Give the datasource definition that is used to store summarized statistical data. The tables are created automatically when the Hive script runs. You just need to create the schema. The same configurations will be done in the BAM server.
- URL: The connection URL for the RDBMS datasource
- JDBC Driver Class: The fully qualified Java class name of the JDBC driver
- Username/Password: Credentials to be passed to the JDBC driver to establish a connection
Tip: To edit the datasource connection pool parameters, click the Show More Options link.
Click Save when you are done. It deploys the Analytics toolbox, which describes the information collected, how to analyze the data, and the location of the database where the analyzed data is stored.
Tip: Are you working with an API Manager cluster or a distributed setup? If so,
If your registry is shared, do the above configurations in one node (e.g., the API Publisher) and restart the other nodes.
If your registry is not shared, do the same configuration in all API Gateway nodes, API Publisher node and API Store nodes by logging in to the admin-dashboard.
Change the API Publisher node to get response-based statistics such as destination-based usage tracking.
You change the stream names, versions, publisher class by editing the
<APIM_HOME>/repository/conf/api-manager.xml
file as given in the example below:Tip: Please read the code comments for details. If you change the default values under streams, the
<APIM_HOME>/statistics/API_Manager_Analytics.tbox
must also be changed accordingly.<APIUsageTracking> <!-- Below property is used to skip trying to connect to event receiver nodes when publishing events even if the stats enabled flag is set to true. --> <SkipEventReceiverConnection>false</SkipEventReceiverConnection> <!-- API Usage Data Publisher. --> <PublisherClass>org.wso2.carbon.apimgt.usage.publisher.APIMgtUsageDataBridgeDataPublisher</PublisherClass> <!-- If below property set to true,then the response message size will be calculated and publish with each successful API invocation event. --> <PublishResponseMessageSize>false</PublishResponseMessageSize> <!-- Data publishing stream names and versions of API requests, responses and faults. If the default values are changed, the toolbox also needs to be changed accordingly. --> <Streams> <Request> <Name>org.wso2.apimgt.statistics.request</Name> <Version>1.0.0</Version> </Request> <Response> <Name>org.wso2.apimgt.statistics.response</Name> <Version>1.0.0</Version> </Response> <Fault> <Name>org.wso2.apimgt.statistics.fault</Name> <Version>1.0.0</Version> </Fault> <Destination> <Name>org_wso2_apimgt_statistics_destination</Name> <Version>1.0.0</Version> <BAMProfileName>bam-profile</BAMProfileName> </Destination> <Throttle> <Name>org.wso2.apimgt.statistics.throttle</Name> <Version>1.0.0</Version> </Throttle> <Workflow> <Name>org.wso2.apimgt.statistics.workflow</Name> <Version>1.0.0</Version> </Workflow> </Streams> </APIUsageTracking>
After configuring WSO2 BAM to collect and analyze statistics of APIs hosted and managed by the API Manager, you can view them through various statistical dashboards in the API Publisher and Store, depending on your permission levels. For information, see Viewing API Statistics.
Change the statistics database
To use a different database than the default H2 for statistical publishing, you change the properties of the datasource element, and additionally delete some metadata tables created by previous executions of the Hive script, if there are any.
To delete the metadata tables,
- Log in to BAM management console and select Add in Analytics menu.
- Go to the Script Editor in the window that opens.
Execute the following script.
drop TABLE APIRequestData; drop TABLE APIRequestSummaryData; drop TABLE APIVersionUsageSummaryData; drop TABLE APIResourcePathUsageSummaryData; drop TABLE APIResponseData; drop TABLE APIResponseSummaryData; drop TABLE APIFaultData; drop TABLE APIFaultSummaryData; drop TABLE APIDestinationData; drop TABLE APIDestinationDataSummaryData; drop TABLE APIRequestDataMinimal; drop TABLE APIRequestSummaryDataMinimal; drop TABLE APIThrottleData; drop TABLE APIThrottleSummaryData;
If there are previous executions of the Hive scripts, manually execute them again by going to Main > Analytics > List in the management console of BAM. Alternatively, you can wait until the periodical execution time occurs.
RDBMS summarized tables
The following are the summarized tables that exist in API Manager.
Note that the MySQL database has been used here as an example.
API_DESTINATION_SUMMARY
This table contains the summarized data of the API destinations and is derived from the destination event stream.
CREATE TABLE IF NOT EXISTS `API_DESTINATION_SUMMARY` ( `api` varchar(100) NOT NULL DEFAULT '', `version` varchar(100) NOT NULL DEFAULT '', `apiPublisher` varchar(100) NOT NULL DEFAULT '', `context` varchar(100) NOT NULL DEFAULT '', `destination` varchar(100) NOT NULL DEFAULT '', `total_request_count` int(11) DEFAULT NULL, `hostName` varchar(100) NOT NULL DEFAULT '', `year` smallint(6) DEFAULT NULL, `month` smallint(6) DEFAULT NULL, `day` smallint(6) DEFAULT NULL, `time` varchar(30) NOT NULL DEFAULT '', PRIMARY KEY (`api`,`version`,`apiPublisher`,`context`,`destination`,`hostName`,`time`) );
API_FAULT_SUMMARY
This table contains the summarized data of faulty API invocations and is derived from the fault event stream.
CREATE TABLE IF NOT EXISTS `API_FAULT_SUMMARY` ( `api` varchar(100) NOT NULL DEFAULT '', `version` varchar(100) NOT NULL DEFAULT '', `apiPublisher` varchar(100) NOT NULL DEFAULT '', `consumerKey` varchar(100) DEFAULT NULL, `context` varchar(100) NOT NULL DEFAULT '', `total_fault_count` int(11) DEFAULT NULL, `hostName` varchar(100) NOT NULL DEFAULT '', `year` smallint(6) DEFAULT NULL, `month` smallint(6) DEFAULT NULL, `day` smallint(6) DEFAULT NULL, `time` varchar(30) NOT NULL DEFAULT '', PRIMARY KEY (`api`,`version`,`apiPublisher`,`context`,`hostName`,`time`) );
API_REQUEST_SUMMARY
This table contains the summary data of the request event stream.
CREATE TABLE IF NOT EXISTS `API_REQUEST_SUMMARY` ( `api` varchar(100) NOT NULL DEFAULT '', `api_version` varchar(100) NOT NULL DEFAULT '', `version` varchar(100) NOT NULL DEFAULT '', `apiPublisher` varchar(100) NOT NULL DEFAULT '', `consumerKey` varchar(100) NOT NULL DEFAULT '', `userId` varchar(100) NOT NULL DEFAULT '', `context` varchar(100) NOT NULL DEFAULT '', `max_request_time` bigint(20) DEFAULT NULL, `total_request_count` int(11) DEFAULT NULL, `hostName` varchar(100) NOT NULL DEFAULT '', `year` smallint(6) DEFAULT NULL, `month` smallint(6) DEFAULT NULL, `day` smallint(6) DEFAULT NULL, `time` varchar(30) NOT NULL DEFAULT '', PRIMARY KEY (`api`,`api_version`,`version`,`apiPublisher`,`consumerKey`,`userId`,`context`,`hostName`,`time`) );
API_Resource_USAGE_SUMMARY
This table contains the summarized data for API Manager usage by resources and is derived from the request event table.
CREATE TABLE IF NOT EXISTS `API_Resource_USAGE_SUMMARY` ( `api` varchar(100) NOT NULL DEFAULT '', `version` varchar(100) NOT NULL DEFAULT '', `apiPublisher` varchar(100) NOT NULL DEFAULT '', `consumerKey` varchar(100) NOT NULL DEFAULT '', `resourcePath` varchar(100) NOT NULL DEFAULT '', `context` varchar(100) NOT NULL DEFAULT '', `method` varchar(100) NOT NULL DEFAULT '', `total_request_count` int(11) DEFAULT NULL, `hostName` varchar(100) NOT NULL DEFAULT '', `year` smallint(6) DEFAULT NULL, `month` smallint(6) DEFAULT NULL, `day` smallint(6) DEFAULT NULL, `time` varchar(30) NOT NULL DEFAULT '', PRIMARY KEY (`api`,`version`,`apiPublisher`,`consumerKey`,`context`,`resourcePath`,`method`,`hostName`,`time`) );
API_RESPONSE_SUMMARY
This table contains the summarized data from API responses. It is derived from the response event table.
CREATE TABLE IF NOT EXISTS `API_RESPONSE_SUMMARY` ( `api_version` varchar(100) NOT NULL DEFAULT '', `apiPublisher` varchar(100) NOT NULL DEFAULT '', `context` varchar(100) NOT NULL DEFAULT '', `serviceTime` int(11) DEFAULT NULL, `total_response_count` int(11) DEFAULT NULL, `hostName` varchar(100) NOT NULL DEFAULT '', `year` smallint(6) DEFAULT NULL, `month` smallint(6) DEFAULT NULL, `day` smallint(6) DEFAULT NULL, `time` varchar(30) NOT NULL DEFAULT '', PRIMARY KEY (`api_version`,`apiPublisher`,`context`,`hostName`,`time`) );
API_VERSION_USAGE_SUMMARY
This table contains the summary data for API Manager usage. It is also derived from the request event table.
CREATE TABLE IF NOT EXISTS `API_VERSION_USAGE_SUMMARY` ( `api` varchar(100) NOT NULL DEFAULT '', `version` varchar(100) NOT NULL DEFAULT '', `apiPublisher` varchar(100) NOT NULL DEFAULT '', `context` varchar(100) NOT NULL DEFAULT '', `total_request_count` int(11) DEFAULT NULL, `hostName` varchar(100) NOT NULL DEFAULT '', `year` smallint(6) DEFAULT NULL, `month` smallint(6) DEFAULT NULL, `day` smallint(6) DEFAULT NULL, `time` varchar(30) NOT NULL DEFAULT '', PRIMARY KEY (`api`,`version`,`apiPublisher`,`context`,`hostName`,`time`) );
API_THROTTLED_OUT_SUMMARY
This table contains the summary of the throttle out API invocation data. It is derived from the throttle out event table and request table.
CREATE TABLE IF NOT EXISTS `API_THROTTLED_OUT_SUMMARY` ( `api` varchar(100) NOT NULL DEFAULT '', `api_version` varchar(100) NOT NULL DEFAULT '', `context` varchar(100) NOT NULL DEFAULT '', `apiPublisher` varchar(100) NOT NULL DEFAULT '', `applicationName` varchar(100) NOT NULL DEFAULT '', `tenantDomain` varchar(100) NOT NULL DEFAULT '', `year` smallint(6) DEFAULT NULL, `month` smallint(6) DEFAULT NULL, `day` smallint(6) DEFAULT NULL, `week` int(11) DEFAULT NULL, `time` varchar(30) NOT NULL DEFAULT '', `success_request_count` int(11) DEFAULT NULL, `throttleout_count` int(11) DEFAULT NULL, PRIMARY KEY (`api`,`api_version`,`context`,`apiPublisher`,`applicationName`,`tenantDomain`,`year`,`month`,`day`,`time`) );
Troubleshoot common issues
Given below is how to do troubleshoot some common issues users come across:
- Do you get an out of memory issue?
See the performance tuning guide for recommendations to tune the server for optimal performance. - Do you get an exception as unable to connect to server Cassandra?
Check if you changed the Cassandra port according to the port offset applied to the default BAM port. See Step 3 under configuring BAM section. Do you get a connection refused exception on the BAM console?
This happens when you execute Hive scripts prior to changing the default port. Add the following line at the beginning of the Hive scripts and rerun:drop table <hive_cassandra_table_name>;
You can find the Hive scripts deployed with the toolbox file, which is inside<BAM_HOME>/repository/deployment/server/bam-toolbox
folder. For information, see Editing an Analytic Script in WSO2 BAM documentation.