This site contains the documentation that is relevant to older WSO2 product versions and offerings.
For the latest WSO2 documentation, visit https://wso2.com/documentation/.
Auto-Scaling Algorithm
Described below is the auto-scaling algorithm used in the WSO2 ELB. It is called "Request in-flight based auto-scaling" algorithm and its basic functionality is auto-scaling based on a particular service domain. Before proceeding to the algorithm, note the following points.
- Auto-scaling task runs every "t" milliseconds, which can be specified in <elb_home>\repository\conf\loadbalancer.conf file.
- For each service domain we keep a vector (say “requestTokenListLengths”) which has a size of “rounds_to_average”.
- For each service domain we keep a map (say “requestTokens”), where an entry represents a request token id and its time-stamp.
- For each incoming request (to load balancer), we generate a unique token id and add it to the “requestTokens” map of that particular service domain, along with the current time stamp.
- For each outgoing request, we remove the corresponding token id from the “requestTokens” map of the corresponding service domain.
- If a message has reached the “message expiry time”, the respective tokens are removed from the “requestTokens” map.
The Algorithm
In each task execution and for a particular service domain:
- The size of the “requestTokens” map is added into the “requestTokenListLengths” vector. If the size of the vector is reached to “rounds_to_average”, the first entry of the vector is removed, before adding the new one.
- A scaling decision is taken only when the “requestTokenListLengths” vector has a size, which is more than or equals to the “rounds_to_average”.
- If the above condition is satisfied, the average requests in flight is calculated by dividing the sum of entries in the “requestTokenListLengths” vector by the size of the vector.
- The handleable requests capacity is calculated by the instances of this service domain, by multiplying “running instances” from “queue_length_per_node”.
- In order to determine if the system needs to be scaled up, we should check if the calculated “average requests in flight” is greater than “ handleable request capacity”. If that is greater, before scaling up, a few more checks should be performed such as:
- Whether the “maximum number of instances” specified in the loadbalancer.conf file is reached for this particular domain.
- Whether there are any instances in pending state etc.
- Then, the handleable requests capacity of one-less of current running instances is calculated, by multiplying “(running instances - 1)” from “queue_length_per_node”. Next we check whether this value is greater than the average requests in flight. If so, the system should be scaled down. Before scaling down, it is ensured if the minimum instance count of this service domain is maintained.
Example Scenario of the Algorithm
Task iteration | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
“requestTokens” size | 0 | 0 | 5 | 7 | 4 | 5 | 3 | 1 | 0 |
Assume that for service domain “X”, the following configuration is specified.
min_app_instances 0; max_app_instances 5; queue_length_per_node 3; rounds_to_average 2;
Also, pendingInstances = 0 and runningInstances = 0.
Iteration 1:
0 |
Vector is not full → we cannot take a scaling decision
Iteration 2:
0 | 0 |
Vector is full → we can take a scaling decision
Average requests in flight → 0
Running Instances → 0
→ No scaling happens
Iteration 3:
0 | 5 |
Vector is full → we can take a scaling decision
Average requests in flight (l)→ 2.5
Running Instances (n)→ 0
queue_length_per_node → 3
→ 2.5 > 0*3 and pendingInstances=0→ scale up → pendingInstances++
Iteration 4:
5 | 7 |
Vector is full → we can take a scaling decision
Average requests in flight (l)→ 6
Running Instances (n)→ 0
queue_length_per_node → 3
→ 6 > 0*3 and pendingInstances=1 → we don't scale up
Iteration 5:
7 | 4 |
Vector is full → we can take a scaling decision
Average requests in flight (l)→ 5.5
Running Instances (n)→ 1
queue_length_per_node → 3
→ 5.5 > 1*3 and pendingInstances=0 → scale up → pendingInstances++
Iteration 6:
4 | 5 |
Vector is full → we can take a scaling decision
Average requests in flight (l)→ 4.5
Running Instances (n)→ 2
queue_length_per_node → 3
→ 4.5 < 2*3 → we do not scale up
→ 4.5 > 1*3 → we do not scale down, since we can't handle the current load with one less running instances
Iteration 7:
5 | 3 |
Vector is full → we can take a scaling decision
Average requests in flight (l)→ 4
Running Instances (n)→ 2
queue_length_per_node → 3
→ 4 < 2*3 → we do not scale up
→ 4 > 1*3 → we do not scale down, since we can't handle the current load with one less running instances
Iteration 8:
3 | 1 |
Vector is full → we can take a scaling decision
Average requests in flight (l)→ 2
Running Instances (n)→ 2
queue_length_per_node → 3
→ 2 < 2*3 → we do not scale up
→ 2 < 1*3 → scale down, since the load has gone down and we could mange to handle the current load with one-less instances.
Iteration 9:
1 | 0 |
Vector is full → Scaling decision can be taken.
Average requests in flight (l)→ 0.5
Running Instances (n)→ 1
queue_length_per_node → 3
→ 0.5 < 1*3 → we do not scale up.
→ 0.5 > 0*3 → we do not scale down, since we can't handle the current load with one less running instance.
Note
In a production environment, the values set for the two properties "queue_length_per_node" and "rounds_to_average" are critical since, uncalibrated values lead to unnecessary regular scale ups and downs.