Performance Tuning

This section describes some recommended performance tuning configurations to optimize the performance of WSO2 DAS. It assumes that you have set up WSO2 DAS on a server running Unix/Linux, which is recommended for a production deployment.

Important

Performance tuning requires you to modify important system files, which affect all programs running on the server. We recommend you to familiarize yourself with these files using Unix/Linux documentation before editing them.
The parameter values we discuss below are just examples. They might not be the optimal values for the specific hardware configurations in your environment. We recommend that you carry out load tests on your environment to tune the product accordingly.

OS-Level Settings

To optimize network and OS performance, configure the following settings in /etc/sysctl.conf file of Linux. These settings specify a larger port range, a more effective TCP connection timeout value, and a number of other important parameters at the OS-level.
```
net.ipv4.tcp_fin_timeout = 30
fs.file-max = 2097152
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_tw_reuse = 1
net.core.rmem_default = 524288
net.core.wmem_default = 524288
net.core.rmem_max = 67108864
net.core.wmem_max = 67108864
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
net.ipv4.ip_local_port_range = 1024 65535      
```
When we have the localhost port range configuration lower bound to 1024, there is a possibility that some processes may pick the ports which are already used by WSO2 servers. Therefore, it's good to increase the lower bound as sufficient for production, e.g., 10,000.
To alter the number of allowed open files for system users, configure the following settings in /etc/security/limits.conf file of Linux.
```
* soft nofile 4096
* hard nofile 65535
```
Optimal values for these parameters depend on the environment.
To alter the maximum number of processes your user is allowed to run at a given time, configure the following settings in /etc/security/limits.conf file of Linux (be sure to include the leading * character). Each carbon server instance you run would require upto 1024 threads (with default thread pool configuration). Therefore, you need to increase the nproc value by 1024 per each carbon server (both hard and soft).
```
* soft nproc 20000
* hard nproc 20000
```

JVM settings

When an XML element has a large number of sub-elements and the system tries to process all the sub-elements, the system can become unstable due to a memory overhead. This is a security risk.

To avoid this issue, you can define a maximum level of entity substitutions that the XML parser allows in the system. You do this using the entity expansion limit attribute that is in the <DAS_HOME>/bin/wso2server.bat file (for Windows) or the <DAS_HOME>/bin/wso2server.sh file (for Linux/Solaris). The default entity expansion limit is 64000.

-DentityExpansionLimit=100000

In a clustered environment, the entity expansion limit has no dependency on the number of worker nodes

WSO2 Carbon platform-level settings

In multitenant mode, the WSO2 Carbon runtime limits the thread execution time. That is, if a thread is stuck or taking a long time to process, Carbon detects such threads, interrupts and stops them. Note that Carbon prints the current stack trace before interrupting the thread. This mechanism is implemented as an Apache Tomcat valve. Therefore, it should be configured in the <PRODUCT_HOME>/repository/conf/tomcat/catalina-server.xml file as shown below.

<Valve className="org.wso2.carbon.tomcat.ext.valves.CarbonStuckThreadDetectionValve" threshold="600"/>

The className is the Java class used for the implementation. Set it to org.wso2.carbon.tomcat.ext.valves.CarbonStuckThreadDetectionValve.
The threshold gives the minimum duration in seconds after which a thread is considered stuck. The default value is 600 seconds.

JDBC Pool Configuration

Within the WSO2 platform, we use Tomcat JDBC pooling as the default pooling framework due to its production ready stability and high performance. The table below indicates some recommendations on how to configure the JDBC pool using the <PRODUCT_HOME>/repository/conf/datasources/master-datasources.xml file. For more details about recommended JDBC configurations, see The Tomcat JDBC Connection Pool.

Property	Description	Recommendations
maxActive	The maximum number of active connections that can be allocated from the connection pool at the same time. The default value is `100.`	The maximum latency (approximately) = (P / M) * T , where, M = maxActive value P = Peak concurrency value T = Time (average) taken to process a query. Therefore, by increasing the maxActive value (up to the expected highest number of concurrency), the time that requests wait in the queue for a connection to be released will decrease. But before increasing the Max. Active value, consult the database administrator, as it will create up to maxActive connections from a single node during peak times, and it may not be possible for the DBMS to handle the accumulated count of these active connections. Note that this value should not exceed the maximum number of requests allowed for your database.
maxWait	The maximum time that requests are expected to wait in the queue for a connection to be released. This property comes into effect when the maximum number of active connections allowed in the connection pool (see maxActive property) is used up.	Adjust this to a value slightly higher than the maximum latency for a request, so that a buffer time is added to the maximum latency. That is, If the maximum latency (approximately) = (P / M) * T , where, M = maxActive value, P = Peak concurrency value, T = Time (average) taken to process a query, then, the maxWait = (P / M) * T + buffer time.
minIdle	The minimum number of connections that can remain idle in the pool, without extra ones being created. The connection pool can shrink below this number if validation queries fail. Default value is 0.	This value should be similar or near to the average number of requests that will be received by the server at the same time. With this setting, you can avoid having to open and close new connections every time a request is received by the server.
maxIdle	The maximum number of connections that can remain idle in the pool.	The value should be less than the maxActive value. For high performance, tune maxIdle to match the number of average, concurrent requests to the pool. If this value is set to a large value, the pool will contain unnecessary idle connections.
testOnBorrow	The indication of whether connection objects will be validated before they are borrowed from the pool. If the object validation fails, it will be dropped from the pool, and there will be an attempt to borrow another connection.	When the connection to the database is broken, the connection pool does not know that the connection has been lost. As a result, the connection pool will continue to distribute connections to the application until the application actually tries to use the connection. To resolve this problem, set "Test On Borrow" to "true" and make sure that the "ValidationQuery" property is set. To increase the efficiency of connection validation and to improve performance, `validationInterval` property should also be used.
validationInterval	To avoid excess validation, run validation at most at this frequency (time in milliseconds). If a connection is due for validation, but has been validated previously within this interval, it will not be validated again. The default value is `30000` (30 seconds).	Deciding the value for the "validationInterval" depends on the target application's behavior. Therefore, selecting a value for this property is a trade-off and ultimately depends on what is acceptable for the application. If a larger value is set, the frequency of executing the Validation Query is low, which results in better performance. Note that this value can be as high as the time it takes for your DBMS to declare a connection as stale. For example, MySQL will keep a connection open for as long as 8 hours, which requires the validation interval to be within that range. However, note that the validation query execution is usually fast. Therefore, even if this value is only large by a few seconds, there will not be a big penalty on performance. Also, specially when the database requests have a high throughput, the negative impact on performance is negligible. For example, a single extra validation query run every 30 seconds is usually negligible. If a smaller value is set, a stale connection will be identified quickly when it is presented. This maybe important if you need connections repaired instantly, e.g. during a database server restart
validationQuery	The SQL query used to validate connections from this pool before returning them to the caller. If specified, this query does not have to return any data, it just can't throw an SQLException. The default value is null. Example values are SELECT 1(mysql), select 1 from dual(oracle), SELECT 1(MS Sql Server).	Specify an SQL query, which will validate the availability of a connection in the pool. This query is necessary when `testOnBorrow` property is true.
MaxPermSize	The memory size allocated for the WSO2 product.	The default memory allocated for the product via this parameter is as follows: `-Xms256m -Xmx512m -XX:MaxPermSize=256m` You can increase the performance by increasing this value in the `<PRODUCT_HOME>/bin/wso2server.sh` file as follows: `-Xms2048m -Xmx2048m -XX:MaxPermSize=1024m`

When it comes to web applications, users are free to experiment and package their own pooling framework such BoneCP.

If you are using an Oracle database, you may sometimes come across an error (ORA-04031) indicating that you have not allocated enough memory for the shared pool of connections. To overcome this, you can allocate more memory to the shared pool by adjusting the following parameters in the <ORACLE_HOME>/dbs/init<SID>.ora file of your Oracle database: SHARED_POOL_RESERVED_SIZE, SHARED_POOL_SIZE and LARGE_POOL_SIZE.

DAS-Level settings

Performance tuning can be tried out in the following areas at the DAS level.

Tuning performance for real-time analytics

The performance is considered in terms of throughput per second (TPS) and latency.

Receiving events

The following parameters which affect the performance relating to receiving events are configured in the <DAS_HOME>/repository/conf/data-bridge/data-bridge-config.xml file. These configurations are common for both thrift and binary protocols.

Property	Description	Default Value	Recommendation
`workerThreads`	The number of threads reserved to handle the load of events received.	10	This value should be increased if you want to increase the throughput by receiving a higher number of events at a given time. The number of available CPU cores should be considered when specifying this value. If the value specified exceeds the number of CPU cores, higher latency would occur as a result of context switching taking place more often.
`maxEventBufferCapacity`	The maximum size allowed for the event receiving buffer in mega bytes. The event receiving buffer temporarily stores the events received before they are forwarded to an event stream .	10	This value should be increased when there is an increase in the receiving throughput. When increasing the value heap memory size also needs to be increased accordingly.
`eventBufferSize`	The number of messages that is allowed in the receiving queue at a given time.	2000	This value should be increased when there is an increase in the receiving throughput .

Publishing events

The following parameters which affect the performance relating to publishing events are configured in the <DAS_HOME>/repository/conf/data-bridge/data-agent-config.xml file. These configurations are common for both thrift and binary protocols.

Property	Description	Default Value	Recommendation
`QueueSize`	The size of the queue event disruptor which handles events before they are published to an application/data store.	32768	The value specified should always be the result of an exponent with 2 as the base. (e.g., 32768 is 2^15). A higher value should be specified when a higher throughput needs to be handled. However, the increase in the load handled at a given time can reduce the speed at which the events are processed. Therefore, a lower value should be specified if you want to reduce the latency.
`BatchSize`	The maximum number of events in a batch sent to the queue event disruptor at a given time.	200	This value should be assigned proportionally to the throughput of events handled. Greater the batch size, higher will be the number of events sent to the queue event disruptor at a given time.
`CorePoolSize`	The number of threads that will be reserved to handle events at the time you start the CEP server. This value will increase as throughput of events handled increases, but it will not exceed the value specified for the `MaxPoolSize` parameter.	1	The number of available CPU cores should be taken into account when specifying this value. Increasing the core pool size may improve the throughput, but latency will also be increased due to context switching.
`MaxPoolSize`	The maximum number of threads that should be reserved at any given time to handle events.	1	The number of available CPU cores should be taken into account when specifying this value. Increasing the maximum core pool size may improve the throughput since more threads can be spawned to handle an increased number of events. However, latency will also increase since a higher number of threads would cause context switching to take place more frequently.

For better througput you can configure the parameters as follows.

<QueueSize>32768</QueueSize>
<BatchSize>200</BatchSize>
<CorePoolSize>1</CorePoolSize>
<MaxPoolSize>1</MaxPoolSize>

For reduced latency, you can configure the parameters as follows.

<QueueSize>256</QueueSize>
<BatchSize>200</BatchSize>
<CorePoolSize>1</CorePoolSize>
<MaxPoolSize>1</MaxPoolSize>

Spark Cluster Tuning

In a DAS production environment, it is important to allocate the resources correctly to each node, in order to achieve optimum performance.

The following diagram depicts a typical DAS multimode set up.

The resource allocation to nodes should be carried out depending on the requirement based on the function on each node. The resource allocation is specified by configuring the <DAS home>/repository/conf/analytics/spark/spark-defaults.conf file. This file contains the following two categories of configurations.

Carbon related configurations: These are carbon specific properties that are applicable when running Spark in a WSO2 Carbon environment and they start with the carbon prefix.
Spark related configurations: These are the default properties shipped with Apache Spark.

Parameters to be configured are as follows.

Cores

Parameter	Default Value	Description
`spark.executor.cores`	All the available cores on the worker.	The number of cores to use on each executor. Setting this parameter allows an application to run multiple executors on the same worker, provided that there are enough cores on that worker. Otherwise, only one executor per application is run on each worker.
`spark.cores.max`	`Int.MAX_VALUE`	The maximum amount of CPU cores to request for the application from across the cluster (not from each machine).
`spark.worker.cores`	`1`	The number of cores assigned for a worker.

Memory

Parameter	Default Value	Description
`spark.worker.memory`	`1g`	Amount of memory to use per worker, in the same format as JVM memory strings (e.g., 512m, 2g).
`spark.executor.memory`	`512m`	Amount of memory to use per executor process, in the same format as JVM memory strings (e.g., 512m, 2g).

The number of executors in a single worker for the carbon-application can be derived as follows:

number of executors in a single worker = FLOOR ( MIN (spark.worker.cores, spark.cores.max) / spark.executor.cores )

Then the amount of memory which should be allocated for a worker should be:

spark.worker.memory ≥ spark.executor.memory × number of executors

Configuration patterns

By setting different values for each of the parameters above, we can have different configuration patterns.

You can consider an AWS m4.xlarge instance for an example. It has 8 vCPUs and 16 GB memory. If you allocate 4 GB and 4 cores to the OS and the Carbon JVM (by default this only takes 1GB memory), then you can allocate spark.worker.memory = 12g and spark.worker.cores = 4.

Single executor workers

If you do not specify a value for the spark.cores.max or spark.executor.cores property, then all the available cores is taken up by one executor.

Executors = min (4,Int.MAX_VALUE)/4 = 1

Therefore, all the memory for that executor can be allocated as follows.

spark.executor.memory=12g

Having large amount of memory for a single JVM is not advisable, due to GC (Garbage Collection) performance.

Multiple executor workers

If spark.executor.cores property is set to 1, then the number of executors can be derived as follows:

min (4, Int.MAX_VALUE) / 1 = 4

Therefore, 12GB/4 = 3GB can be allocated per executor.

Resource limited workers

The number of cores used for the carbon application cluster-wide can be limited by specifying a value for the spark.cores.max property.

If spark.cores.max = 3 per node x 4 nodes = 12, there are 4 excess cores that can be used by some other application.

Let us consider the above multiple executor setup.

Here, there are resources for 16 executors with 16 cores and 48GB of memory. With the spark.cores.max = 12 (i.e. 3 x 4), 12 executors are assigned to the carbon application and the rest of the cores and memory can be assigned to another Spark application (i.e. 4 cores and 12 GB are available in the cluster and that can be used by the application, depending on its preference).