Demonstrating the Auto-Scaling Algorithm
Described below is the auto-scaling algorithm used in WSO2 ELB. It is called Request in-flight based auto-scaling algorithm and it auto scales based on a particular service domain.
Before proceeding to the algorithm, note the following points.
- All variables used here are defined in section Auto-Scaling Configuration > Common Parameters. We recommend you go through it.
An auto-scaling task runs every t milliseconds, which can be specified in <ELB_HOME>/repository/conf/loadbalancer.conf file.
For each service domain, we maintain a vector (say requestTokenListLengths) , which has a size of rounds_to_average.
For each service domain, we maintain a map (say requestTokens ) where an entry represents a request token id and its time-stamp.
For each incoming request (to load balancer), we generate a unique token id and add it to the requestTokens map of that particular service domain, along with the current time stamp.
For each outgoing request, we remove the corresponding token id from the requestTokens map of the corresponding service domain.
If a message has reached the message expiry time, the respective tokens are removed from the requestTokens map.
The Algorithm: How ELB Calculates the Requests In-Flight
- WSO2 ELB keeps track of the requests that come to it for various service clusters. For each incoming request, a token is added against the relevant service cluster. The corresponding token is removed when the message leaves the ELB or expires.
- The minimum and maximum numbers of instances of service clusters are always respected by the ELB. It ensures that the system maintains the minimum number of services and does not scale beyond its limit.
- The average requests in-flight for a particular service cluster (avg) = total number of requests in-flight * (1/r), where r represents rounds_to_average. You can avoid averaging in-flight requests over a period of time by setting the value of rounds_to_average (r) to 1.
- When scaling up,
- the maximum number of requests that a service instance can withstand over an auto-scaler task interval (maxRpt) = (Rps) * (t/1000) * (AUR).
- ELB decides to scale up if, avg > maxRpt * (number of running instances of this service cluster).
- When scaling down,
- the imaginary lower bound value (minRpt) = (Rps) * (t/1000) * (ALR) * (SDF).
- ELB decides to scale down if avg < minRpt * (number of running instances of this service cluster - 1).
To print requests in-flight for every task iteration on the wso2carbon.log, add the following property to <ELB_HOME>/repository/conf/log4j.properties file and restart server.
log4j.logger.org.wso2.carbon.mediator.autoscale.lbautoscale.task.ServiceRequestsInFlightAutoscaler=DEBUG
Example Scenario of the Algorithm
Task iteration | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
Requests in-flight | 10 | 1 | 250 | 190 | 350 | 400 | 160 | 15 | 0 |
10 |
10 | 1 |
Vector is full → we can take a scaling decision
Average requests in flight → (10 + 1) / 2 = 5.5
Running Instances → 1
Handle-able requests → 1* 210 = 210
5.5 < 210→ No need to scale
1 | 250 |
Vector is full → we can take a scaling decision
Average requests in flight → (1 + 250) / 2 = 125.5
Running Instances → 1
Handle-able requests → 1* 210 = 210
125.5 < 210→ No need to scale250 | 190 |
Average requests in flight → (250 + 190) / 2 = 220
Running Instances → 1
Handle-able requests → 1* 210 = 210
220 > 210→ and pending instances=0→ scale up! → pending instances++
190 | 350 |
Average requests in flight → (190 + 350) / 2 = 270
Running Instances → 1
Handle-able requests → 1* 210 = 210
270 > 210→ and pending instances=1→ we don't scale up
350 | 400 |
Vector is full → we can take a scaling decision
Average requests in flight → (350 + 400) / 2 = 375
Running Instances → 2
Handle-able requests → 2* 210 = 420
375 < 420→ and pending instances=0→ no need to scale up.400 | 160 |
Vector is full → we can take a scaling decision
Average requests in flight → (400 + 160) / 2 = 280
Running Instances → 2
Handle-able requests → 2* 210 = 420
280 < 420→ and pending instances=0→ no need to scale up.
280 > 15 * 1 → we do not scale down, since we can't handle the current load with one less running instances!
160 | 15 |
Vector is full → we can take a scaling decision
Average requests in flight → (160 + 15) / 2 = 87.5
Running Instances → 2
Handle-able requests → 2* 210 = 420
87.5 < 420→ and pending instances=0→ no need to scale up.
imaginary lower bound value (minRpt) = (Rps) * (t/1000) * (ALR) * (SDF) = 5 * 60 * 0.2 * 0.25 = 15
87.5 > 15 * 1 → we do not scale down, since we can't handle the current load with one less running instances15 | 0 |
Vector is full → we can take a scaling decision
Average requests in flight → (15 + 0) / 2 = 7.5
Running Instances → 2
Handle-able requests → 2* 210 = 420
7.5 < 420→ and pending instances=0→ no need to scale up.
imaginary lower bound value (minRpt) = (Rps) * (t/1000) * (ALR) * (SDF) = 5 * 60 * 0.2 * 0.25 = 15
7.5 < 15 * 1 → we need to scale down, since there are instances that are not required for the system.
In a production environment, the values set for the two properties max_requests_per_second and rounds_to_average are critical since, uncalibrated values lead to unnecessary regular scale ups and downs.
Testing the Auto-Scaling Algorithm
Since the auto-scaling algorithm is based on the number of requests in-flight, we should first have a back-end service which takes a bit of time (say 30 seconds) to respond, in order to easily build a test environment which scales.
1. Define values for auto-scaling parameters. For example, assume the following dummy values for the service cluster whose clustering domain is 'wso2.as.domain' and sub domain is 'worker'.
autoscaler_task_interval 60000; min_app_instances 1; server_startup_delay 180000; max_requests_per_second 5; rounds_to_average 2; alarming_upper_rate 0.7; alarming_lower_rate 0.2; scale_down_factor 0.25;
2. According to the defined values, the number of maximum requests that a service instance can withstand over an autoscaler task interval (maxRpt) = (Rps) * (t/1000) * (AUR) = 5 * (60000/1000) * 0.7 = 210.
This means that an instance of this service cluster can serve 210 requests on average. In this example, we have configured 'rounds_to_average' as 2. That means, for the system to scale up, there should be more than 210 requests in-flight on average for this service cluster over a period of autoscaler_task_interval * rounds_to_average.
3. Next, configure the embedded auto-scaler as described in section Configuring the Embedded Auto-Scaler .
4. Start the ELB server. After a while, you should see a member joining the service cluster. Once the member joined, start load testing by sending requests to the aforementioned service. Since your server can handle 210 requests on average, your load test should be configured such that it sends more than 210 concurrent requests.