WSO2 API-M Performance and Capacity Planning

The following sections analyze the results of WSO2 API Manager performance tests done in the Amazon EC2 environment.

Summary

The performance of WSO2 API Manager was measured using the following APIs, which invoke a simple “Netty HTTP Echo Service”. As the name suggests, the Netty service echoes back any request posted to the service.

Echo API: This is a secured API, which directly invokes the back-end service.
Mediation API: This is also a secured API, which has a “sequence” as a mediation extension to modify the message.

Tests were done using 100, 200, 300, 1000, and 2000 concurrent users. Concurrent Users mean that there are multiple users accessing the API Gateway at the same time. Different Message Sizes (Payload) were used for the tests with different back-end service delays. The message sizes used are 50B, 1KiB, 10KiB, and 100KiB. The back-end delays were 0ms, 30ms, 500ms, and 1s.

100KiB message size was not tested with Mediation API. The back-end delay will also be referred as “Sleep Time” in this document.

Two key performance metrics were used to measure the performance of each test.

Throughput: This measures the number of API invocations that the API Manager Gateway server processed during a specific time interval (e.g. per second).
Response Time: This measures end-to-end processing time for an operation (of invoking an API using HTTPS protocol). The complete distribution of response times were recorded.

The heap size of WSO2 API Manager was increased to 4GB from 2GB, which is the default. Except for increasing the heap size of API Manager, there were no other specific configurations used to optimise the performance of WSO2 API Manager.

With WSO2 API Manager, an average user use ~1KiB messages and most of the back-ends usually responds in ~30ms. Therefore, let’s look at some charts to understand performance test results for above APIs when using 1KiB messages with 30ms backend delay.

The deployment used was All-in-one API Manager.

The following figure shows how the Throughput changes for different number of concurrent users.

Key observations:

More concurrent users mean more requests to the API Manager Gateway. Therefore, the Throughput of the API Manager Gateway increases as the number of concurrent users accessing the APIs increases. The maximum throughput is obtained for 1000 concurrent users for both “Echo API” and “Mediation API” and the throughput degrades after 1000 concurrent users due to resource contentions in the system. The degradation point mainly depends on the hardware resources.
Echo API throughput is much better than the Mediation API. Main reason is that the Mediation API has a mediation extension, which uses a “Payload Factory” Mediator. This mediation in the sequence does a JSON to JSON message transformation. That means, the Mediation API reads the message (payload) to do the message transformation and it has a considerable overhead than the “Echo API”, which is similar to a “Direct Proxy”. A “Direct Proxy” does not perform any processing on the messages that pass through it.

The following figure shows how the Average Response Time changes for different number of concurrent users.

Key observations:

The Average Response Time increases with the number of concurrent users. Since the number of requests to serve increases with more users, there are more resource contentions. Therefore, the number of concurrent users served by the API Gateway needs to be decided on the required response time limits. For example, in order to keep average response time below 100ms for APIs with 30ms backend delay, the maximum concurrent users accessing the API Gateway should be limited to 300.
The Mediation API response times are higher than Echo API due to the performance overhead of mediation extension.

Let’s look at the 90th, 95th, and 99th Response Time percentiles. This is useful to measure the percentage of requests exceeded the response time value for a given percentile. A percentile can also tell the percentage of requests completed below the particular response time value.

For example, when there are 100 concurrent users, 90th response time percentile for Echo API is 36ms. This means that 10% of the requests have taken more than 36ms to respond. Similarly, 99th response time percentile for Echo API is 146ms, which means that 99% of the requests have completed within 146ms.

Key observations:

Mediation API is slower than the Echo API due to the performance overhead of mediation extension.
Response Times percentiles are less than 300ms up to 300 concurrent users.
For higher concurrent users, 99th percentile of Mediation API response times goes beyond 1 second, which means that 1% of the API invocations took 1 second to 1.6 seconds.

1000 to 2000 concurrent users mean a lot and it is not very common. To support more concurrent users with acceptable response times, it is recommended to scale horizontally or vertically. When scaling horizontally, two or more Gateway nodes need to be used with a load balancer. To measure the performance after scaling, another load test must be carried out.

In order to see the memory usage, the Garbage Collection (GC) logs in the API Manager was enabled and the GC log for each performance test was analyzed using the GCViewer.

The GC Throughput was calculated for each test to check whether GC operations are not impacting the performance of the server. The GC Throughput is the time percentage of the application, which was not busy with GC operations. For example, if the application ran for 10 minutes and 30 seconds were taken for GC operations, the GC Throughput is 1-301060100=95%. A GC Throughput over 90% is good and that means the allocated heap was enough to handle all concurrent requests, which allocate objects in the memory.

The following chart shows the GC Throughput (%) for different number of concurrent users.

Key observations:

GC Throughput decreases when the number of concurrent users increase. This means that the time spent for GC is increasing with concurrent users.
GC Throughput for Mediation API is better than Echo API. The Mediation API processes requests slower than the Echo API, which means that the object allocation rate is also less than the Echo API.

WSO2 API Manager All-in-one Deployment

In this deployment, the WSO2 API Manager is deployed in one EC2 instance and a RDS instance is used for databases. The back-end service, which is a simple Netty HTTP Echo Service is deployed in a separate EC2 instance. There are two JMeter servers with a JMeter client in three EC2 instances.

See: JMeter Remote Test. Two JMeter servers are used to simulate high number of concurrent users.

Name	EC2 Instance Type	vCPU	Mem (GiB)
Apache JMeter Client	c3.large	2	3.75
Apache JMeter Server 01	c3.xlarge	4	7.5
Apache JMeter Server 02	c3.xlarge	4	7.5
WSO2 API Manager	c3.xlarge	4	7.5
Netty HTTP Echo Service	c3.xlarge	4	7.5
MySQL	db.m3.medium (RDS)	1	3.75

See the following links for more details on Amazon Instance Types

The operating system is Ubuntu 16.04.2 LTS.

WSO2 API Manager version is 2.1.0 and Apache JMeter version is 3.2.

MySQL version in RDS instance was 5.7

Observations from all results

There are key observations for the average user scenario of accessing APIs with 1KiB messages and the back-end service having 30ms delay.

The following are the key observations from the all performance tests done with different message sizes and different backend delays. (See Comparison of results for all charts used to derive the pointed mentioned below)

Key observations related to throughput:

Throughput increases up to a certain limit when the number of concurrent users increase. Mediation API throughput increase rate is much lower than the Echo API.
Throughput decreases when the message sizes increase. The Mediation API throughput decrease rate is much higher than the Echo API.
Throughput decreases when the backend sleep time increase. This observation is similar to both APIs. This means that if the backend takes more time, the request processing rate at the API Manager gateway will be less.

Key observations related to response time:

Average response time increases when the number of concurrent users increase. The increasing rate of average response time for Mediation API is much higher than the Echo API.
Average response time increases considerably for Mediation API when the message sizes increase due to the message processing. The average response time of Echo API is not increasing as much as the Mediation API.
Average Response Time increases when the backend sleep time increases. This observation is similar to both APIs.

Key observations related to GC Throughput:

The GC throughput decreases when the number of concurrent users increase. When there are more concurrent users, the object allocation rate increases.
The GC throughput increases when the message sizes increases. The request processing rate slows down due to the time taken to process large messages. Therefore, the object allocation rate decreases when the message sizes increases.
The GC throughput increases when the backend sleep time increases. The object allocation rate will be low when the backend takes more time to respond.

Backend Service

The backend service used for tests were developed using Netty. Since there can be up to 2000 users, 2000 threads were used for Netty (to avoid the back-end becoming a bottleneck)

The Netty server also has a parameter to specify the number of seconds to sleep to simulate the delays in the backend service)

Performance Testing Tool

As mentioned above, Apache JMeter was used to run load tests. In JMeter, the number of concurrent users was specified and the following details were taken after each test.

# Samples - The number of requests sent with the given number of concurrent users.
Error Count - How many request errors were recorded.
Error % - Percent of requests with errors
Average - The average response time of a set of results
Min - The shortest time taken for a request
Max - The longest time taken for a request
90th Percentile - 90% of the requests took no more than this time. The remaining samples took at least as long as this
95th Percentile - 95% of the requests took no more than this time. The remaining samples took at least as long as this
99th Percentile - 99% of the requests took no more than this time. The remaining samples took at least as long as this
Throughput - The Throughput is measured in requests per second.
Received KB/sec - The throughput measured in received Kilobytes per second
Sent KB/sec - The throughput measured in sent Kilobytes per second