In production environments, services are clustered in order to scale up applications, achieve high availability or to achieve both. By scaling up, the application supports a larger number of user requests and through high availability, the service is available even if few servers are down. To support balancing of load among these servers, a load balancer is used to distribute requests among the nodes in the cluster. The nodes that receive this incoming traffic are a set of backend worker nodes in a worker/manager separated cluster or in a cluster that does not support worker/manager separation. This set of worker nodes can be either statically configured or dynamically discovered. In the static mode, new nodes cannot be added to the pre-defined set of worker nodes at runtime, while dynamic load balancers support addition and removal of worker nodes at runtime, without having to know the IP addresses and other connection details of the backend nodes beforehand.
Load balancers come in wide varieties and among them are hardware load balancers, DNS load balancers, transport level load balancers (e.g., HTTP level like Apache, Tomcat) and application level load balancers (like Synapse). High-level load balancers, like application level load balancers, operate with more information about the messages they route and hence provide more flexibility, but also incur more overhead. So the choice of a load balancer is a trade-off between performance and flexibility.
There are a wide variety of algorithms or methods of distributing the load between servers. Random or round-robin distribution of load are simple approaches, while more sophisticated algorithms consider runtime properties in the system like machine load or the number of pending requests into consideration. Furthermore, the distribution can also be controlled by application specific requirements like sticky sessions. However, it is worth noting that with a reasonably diverse set of users, simple approaches tend to perform on par with complex approaches and therefore, they should be given the considerations first.
In WSO2 Carbon-based products, cluster messages are used based on axis2 clustering to identify a node that is joining or leaving the cluster.
The following are some key aspects of load balancing.
Session affinity
Stateful applications inherently do not scale well. Therefore, architects minimize server-side state in order to gain better scalability. State replication induces huge performance overheads on the system. As a solution to the problem of deploying stateful applications in clusters, session-affinity-based load balancing can be used.
Session affinity ensures that, when a client sends a session ID, the load balancer forwards all requests containing a particular session ID to the same backend worker node, irrespective of the specified load balancing algorithm. This may look like defeating the purpose of load balancing. But, before the session is created, the request will be dispatched to the worker node which is next-in-line, and a session will be established with that worker node.
Service-aware load balancing
Service-awareness provides a cost-effectiveness when used not only in the cloud but also on-premise. In addition, a single load balancer can balance incoming requests to clusters of different services such as Application Servers, Business Process Servers, Mashup Servers etc.
Most of the load balancing processes in a real production environment do not happen at the load balancer level, but the backend worker nodes. As a result, a typical load balancer is designed to front a large number of backend worker nodes. In a traditional deployment, one LB may front a cluster of homogenous worker nodes. One load balancer is generally capable of handling multiple such clusters, and route traffic to the correct cluster, while balancing load according to the algorithm specified for that cluster.
A cluster of homogeneous worker nodes is called a Cloud service, in Cloud deployments. A load balancer that fronts multiple Cloud services is typically called a service-aware load balancer.
Tenant-aware load balancing
Tenant-awareness allows the load balancer to provide a scalable approach for balancing the load across a set of tenants sharing a collection of worker nodes. Tenants can also be partitioned in various ways.
When a typical Cloud deployment scales, it requires tenant-partitioning. For a single Cloud service, there can be multiple clusters and each of these service clusters can handle a subset of the tenants in the system. In such a tenant-partitioned deployment, the load balancers themselves need to be tenant-aware, in order to be able to route the requests to the proper tenant clusters. In a Cloud environment, a tenant-aware load balancer should also be service-aware, since it is the service clusters that are partitioned according to the tenants.