Demonstrating the Auto-Scaling Algorithm

Described below is the auto-scaling algorithm used in WSO2 ELB. It is called Request in-flight based auto-scaling algorithm and it auto scales based on a particular service domain.

WSO2 will implement a CPU-based auto-scaling algorithm in a future release.

Before proceeding to the algorithm, note the following points.

All variables used here are defined in section Auto-Scaling Configuration > Common Parameters. We recommend you go through it.
An auto-scaling task runs every t milliseconds, which can be specified in <ELB_HOME>/repository/conf/loadbalancer.conf file.
For each service domain, we maintain a vector (say requestTokenListLengths) , which has a size of rounds_to_average.
For each service domain, we maintain a map (say requestTokens ) where an entry represents a request token id and its time-stamp.
For each incoming request (to load balancer), we generate a unique token id and add it to the requestTokens map of that particular service domain, along with the current time stamp.
For each outgoing request, we remove the corresponding token id from the requestTokens map of the corresponding service domain.
If a message has reached the message expiry time, the respective tokens are removed from the requestTokens map.

The Algorithm: How ELB Calculates the Requests In-Flight

WSO2 ELB keeps track of the requests that come to it for various service clusters. For each incoming request, a token is added against the relevant service cluster. The corresponding token is removed when the message leaves the ELB or expires.
The minimum and maximum numbers of instances of service clusters are always respected by the ELB. It ensures that the system maintains the minimum number of services and does not scale beyond its limit.
The average requests in-flight for a particular service cluster (avg) = total number of requests in-flight * (1/r), where r represents rounds_to_average. You can avoid averaging in-flight requests over a period of time by setting the value of rounds_to_average (r) to 1.
When scaling up,
- the maximum number of requests that a service instance can withstand over an auto-scaler task interval (maxRpt) = (Rps) * (t/1000) * (AUR).
- ELB decides to scale up if, avg > maxRpt * (number of running instances of this service cluster).
When scaling down,
- the imaginary lower bound value (minRpt) = (Rps) * (t/1000) * (ALR) * (SDF).
- ELB decides to scale down if avg < minRpt * (number of running instances of this service cluster - 1).

To print requests in-flight for every task iteration on the wso2carbon.log, add the following property to <ELB_HOME>/repository/conf/log4j.properties file and restart server.

log4j.logger.org.wso2.carbon.mediator.autoscale.lbautoscale.task.ServiceRequestsInFlightAutoscaler=DEBUG

Example Scenario of the Algorithm

Task iteration	1	2	3	4	5	6	7	8	9
Requests in-flight	10	1	250	190	350	400	160	15	0

Since 'rounds_to_average' value is 2, let’s use a request frame of two columns.

Iteration 1:

10

Vector is not full → we cannot take a scaling decision

Iteration 2:

10

1

Vector is full → we can take a scaling decision

Average requests in flight → (10 + 1) / 2 = 5.5

Running Instances → 1

Handle-able requests → 1* 210 = 210

5.5 < 210→ No need to scale

Iteration 3:

1

250

Vector is full → we can take a scaling decision

Average requests in flight → (1 + 250) / 2 = 125.5

Running Instances → 1

Handle-able requests → 1* 210 = 210

125.5 < 210→ No need to scale

Iteration 4:

250

190

Vector is full → we can take a scaling decision

Average requests in flight → (250 + 190) / 2 = 220

Running Instances → 1

Handle-able requests → 1* 210 = 210

220 > 210→ and pending instances=0→ scale up! → pending instances++

Iteration 5:

190

350

Vector is full → we can take a scaling decision

Average requests in flight → (190 + 350) / 2 = 270

Running Instances → 1

Handle-able requests → 1* 210 = 210

270 > 210→ and pending instances=1→ we don't scale up

Iteration 6:

350

400

Vector is full → we can take a scaling decision

Average requests in flight → (350 + 400) / 2 = 375

Running Instances → 2

Handle-able requests → 2* 210 = 420

375 < 420→ and pending instances=0→ no need to scale up.

Iteration 7:

400

160

Vector is full → we can take a scaling decision

Average requests in flight → (400 + 160) / 2 = 280

Running Instances → 2

Handle-able requests → 2* 210 = 420

280 < 420→ and pending instances=0→ no need to scale up.

imaginary lower bound value (minRpt) = (Rps) * (t/1000) * (ALR) * (SDF) = 5 * 60 * 0.2 * 0.25 = 15

280 > 15 * 1 → we do not scale down, since we can't handle the current load with one less running instances!

Iteration 8:

160

15

Vector is full → we can take a scaling decision

Average requests in flight → (160 + 15) / 2 = 87.5

Running Instances → 2

Handle-able requests → 2* 210 = 420

87.5 < 420→ and pending instances=0→ no need to scale up.

imaginary lower bound value (minRpt) = (Rps) * (t/1000) * (ALR) * (SDF) = 5 * 60 * 0.2 * 0.25 = 15

87.5 > 15 * 1 → we do not scale down, since we can't handle the current load with one less running instances

Iteration 9:

15

0

Vector is full → we can take a scaling decision

Average requests in flight → (15 + 0) / 2 = 7.5

Running Instances → 2

Handle-able requests → 2* 210 = 420

7.5 < 420→ and pending instances=0→ no need to scale up.

imaginary lower bound value (minRpt) = (Rps) * (t/1000) * (ALR) * (SDF) = 5 * 60 * 0.2 * 0.25 = 15

7.5 < 15 * 1 → we need to scale down, since there are instances that are not required for the system.

In a production environment, the values set for the two properties max_requests_per_second and rounds_to_average are critical since, uncalibrated values lead to unnecessary regular scale ups and downs.

Testing the Auto-Scaling Algorithm

Since the auto-scaling algorithm is based on the number of requests in-flight, we should first have a back-end service which takes a bit of time (say 30 seconds) to respond, in order to easily build a test environment which scales.

1. Define values for auto-scaling parameters. For example, assume the following dummy values for the service cluster whose clustering domain is 'wso2.as.domain' and sub domain is 'worker'.

autoscaler_task_interval 60000;
min_app_instances  1;
server_startup_delay 180000;
max_requests_per_second   5;
rounds_to_average       2;
alarming_upper_rate 0.7;
alarming_lower_rate 0.2;
scale_down_factor 0.25;

2. According to the defined values, the number of maximum requests that a service instance can withstand over an autoscaler task interval (maxRpt) = (Rps) * (t/1000) * (AUR) = 5 * (60000/1000) * 0.7 = 210.

This means that an instance of this service cluster can serve 210 requests on average. In this example, we have configured 'rounds_to_average' as 2. That means, for the system to scale up, there should be more than 210 requests in-flight on average for this service cluster over a period of autoscaler_task_interval * rounds_to_average.

3. Next, configure the embedded auto-scaler as described in section Configuring the Embedded Auto-Scaler .

4. Start the ELB server. After a while, you should see a member joining the service cluster. Once the member joined, start load testing by sending requests to the aforementioned service. Since your server can handle 210 requests on average, your load test should be configured such that it sends more than 210 concurrent requests.