Auto-Scaling Algorithm

Described below is the auto-scaling algorithm used in the WSO2 ELB. It is called "Request in-flight based auto-scaling" algorithm and its basic functionality is auto-scaling based on a particular service domain. Before proceeding to the algorithm, note the following points.

Auto-scaling task runs every "t" milliseconds, which can be specified in <elb_home>\repository\conf\loadbalancer.conf file.
For each service domain we keep a vector (say “requestTokenListLengths”) which has a size of “rounds_to_average”.
For each service domain we keep a map (say “requestTokens”), where an entry represents a request token id and its time-stamp.
For each incoming request (to load balancer), we generate a unique token id and add it to the “requestTokens” map of that particular service domain, along with the current time stamp.
For each outgoing request, we remove the corresponding token id from the “requestTokens” map of the corresponding service domain.
If a message has reached the “message expiry time”, the respective tokens are removed from the “requestTokens” map.

The Algorithm

In each task execution and for a particular service domain:

The size of the “requestTokens” map is added into the “requestTokenListLengths” vector. If the size of the vector is reached to “rounds_to_average”, the first entry of the vector is removed, before adding the new one.
A scaling decision is taken only when the “requestTokenListLengths” vector has a size, which is more than or equals to the “rounds_to_average”.
If the above condition is satisfied, the average requests in flight is calculated by dividing the sum of entries in the “requestTokenListLengths” vector by the size of the vector.
The handleable requests capacity is calculated by the instances of this service domain, by multiplying “running instances” from “queue_length_per_node”.
In order to determine if the system needs to be scaled up, we should check if the calculated “average requests in flight” is greater than “ handleable request capacity”. If that is greater, before scaling up, a few more checks should be performed such as:
- Whether the “maximum number of instances” specified in the loadbalancer.conf file is reached for this particular domain.
- Whether there are any instances in pending state etc.
Then, the handleable requests capacity of one-less of current running instances is calculated, by multiplying “(running instances - 1)” from “queue_length_per_node”. Next we check whether this value is greater than the average requests in flight. If so, the system should be scaled down. Before scaling down, it is ensured if the minimum instance count of this service domain is maintained.

Example Scenario of the Algorithm

Task iteration	1	2	3	4	5	6	7	8	9
“requestTokens” size	0	0	5	7	4	5	3	1	0

Assume that for service domain “X”, the following configuration is specified.

min_app_instances 0;
max_app_instances 5;
queue_length_per_node 3;
rounds_to_average 2;

Also, pendingInstances = 0 and runningInstances = 0.

Iteration 1:

0

Vector is not full → we cannot take a scaling decision

Iteration 2:

0

Vector is full → we can take a scaling decision

Average requests in flight → 0

Running Instances → 0

→ No scaling happens

Iteration 3:

0

5

Vector is full → we can take a scaling decision

Average requests in flight (l)→ 2.5

Running Instances (n)→ 0

queue_length_per_node → 3

→ 2.5 > 0*3 and pendingInstances=0→ scale up! → pendingInstances++

Iteration 4:

5

7

Vector is full → we can take a scaling decision

Average requests in flight (l)→ 6

Running Instances (n)→ 0

queue_length_per_node → 3

→ 6 > 0*3 and pendingInstances=1 → we don't scale up!

Iteration 5:

7

4

Vector is full → we can take a scaling decision

Average requests in flight (l)→ 5.5

Running Instances (n)→ 1

queue_length_per_node → 3

→ 5.5 > 1*3 and pendingInstances=0 → scale up! → pendingInstances++

Iteration 6:

4

5

Vector is full → we can take a scaling decision

Average requests in flight (l)→ 4.5

Running Instances (n)→ 2

queue_length_per_node → 3

→ 4.5 < 2*3 → we do not scale up!

→ 4.5 > 1*3 → we do not scale down, since we can't handle the current load with one less running instances!

Iteration 7:

5

3

Vector is full → we can take a scaling decision

Average requests in flight (l)→ 4

Running Instances (n)→ 2

queue_length_per_node → 3

→ 4 < 2*3 → we do not scale up!

→ 4 > 1*3 → we do not scale down, since we can't handle the current load with one less running instances!

Iteration 8:

3

1

Vector is full → we can take a scaling decision

Average requests in flight (l)→ 2

Running Instances (n)→ 2

queue_length_per_node → 3

→ 2 < 2*3 → we do not scale up!

→ 2 < 1*3 → scale down, since the load has gone down and we could mange to handle the current load with one-less instances.!

Iteration 9:

1

0

Vector is full → we can take a scaling decision

Average requests in flight (l)→ 0.5

Running Instances (n)→ 1

queue_length_per_node → 3

→ 0.5 < 1*3 → we do not scale up!

→ 0.5 > 0*3 → we do not scale down, since we can't handle the current load with one less running instances!

Note

In a production environment, the values set for the two properties "queue_length_per_node" and "rounds_to_average" are critical since, uncalibrated values lead to unnecessary regular scale ups and downs.