Configure Task Scheduling

Work in progress!

Follow the instructions given on this page to configure and set up the Task Scheduling component for your server. This component is used by the following WSO2 products:

WSO2 Data Services Server (WSO2 DSS)
WSO2 Enterprise Service Bus (WSO2 ESB)

You can find detailed use cases of this component in the relevant product documentation. See the following:

Scheduled tasks support many modes of operations, where it fully supports load balancing and fail-over of tasks. The task configuration file can be found in the tasks-config.xml file stored in the <PRODUCT_HOME>/repository/conf/etc/ directory. The default configuration is as follows:

<tasks-configuration xmlns:svns="http://org.wso2.securevault/configuration">
    <taskServerMode>AUTO</taskServerMode>
    <taskServerCount>2</taskServerCount>
 
	<!-- The default location resolver configuration -->
    <defaultLocationResolver>
        <locationResolverClass>org.wso2.carbon.ntask.core.impl.RoundRobinTaskLocationResolver</locationResolverClass>
    </defaultLocationResolver>
    <taskClientDispatchAddress>https://localhost:9448</taskClientDispatchAddress>
    <remoteServerAddress>https://localhost:9443</remoteServerAddress>
    <remoteServerUsername>admin</remoteServerUsername>
    <remoteServerPassword>admin</remoteServerPassword>
    <!--remoteServerPassword svns:secretAlias="remote.task.server.password"></remoteServerPassword-->
</tasks-configuration>

Given below are the task handling settings that you can configure for your server using the tasks-config.xml file.

<taskServerMode>: There are four task handling modes available for every WSO2 product as explained below.
- AUTO is the default task handling mode. This setting detects if clustering is enabled in the server and automatically switches to CLUSTERED task handling mode.
- STANDALONE mode is used when the Carbon server is used as a single installation. That is, tasks will be managed locally within the server.
- CLUSTERED mode is used when a cluster of Carbon servers are put together. With this setting, if one of the servers in the cluster fail, the tasks will be rescheduled in one of the remaining server nodes. This requires Axis2 clustering to work.
- REMOTE mode is used when all tasks should be triggered using an independent task handling server such of WSO2 Task Server. That is, all carbon servers using such an external task handling server should be running on REMOTE mode, while the task handling server can be running on AUTO, STANDALONE or CLUSTERED mode. See how WSO2 Task Server can be used to interface tasks from other servers.
<taskServerCount>: When the <taskServerMode> is CLUSTERED, this value specifies the number of nodes in the task server cluster. The tasks will not be scheduled until the given number of servers are activated.
<defaultLocationResolver>: This setting applies when you have a clustered environment. The default location resolver basically controls how the scheduled tasks can be allocated among multiple nodes of a cluster. The possible options are as follows:
- RoundRobinTaskLocationResolver: Cluster nodes are selected on a round robin basis and the tasks are allocated.
- RandomTaskLocationResolver: Cluster nodes are randomly selected and the tasks are allocated.
- RuleBasedLocationResolver: This allows you to set a criteria for selecting the cluster nodes to which the tasks should be allocated. The [task-type-pattern],[task-name-pattern] and [address-pattern of the server node] can be used as criteria. For example, with this setting, a scheduled task that matches a particular [task-type-pattern] and [task-name-pattern] will be allocated to the server node with a particular [address-pattern]. If multiple server nodes in the cluster match the [address-pattern], the nodes are selected on a round robin basis. The criteria is specified in the configuration using the <property> element. Therefore, you can define multiple properties containing different criteria values.
  
  For example, see the details of the RuleBasedLocationResolver configuration given below.
```
<defaultLocationResolver>
        <locationResolverClass>org.wso2.carbon.ntask.core.impl.RuleBasedLocationResolver</locationResolverClass>
        <properties>
            <property name="rule-1">HIVE_TASK,HTTP_SCRIPT*,192.168.1.*</property>
            <property name="rule-2">HIVE_TASK,.*,192.168.2.*</property>
            <property name="rule-5">.*,.*,.*</property>
        </properties>
</defaultLocationResolver>
```
  As shown in this example, the property names (rule-1, rule-2 and rule-5) define a sequence for the list of properties in the configuration. Therefore, scheduled tasks will evaluate the criteria specified in each property according to the sequence order; i.e., rule-1 is checked before rule-2. In other words, the scheduled task will first check if it matches the criteria in rule-1, and if it does not, it will check rule-2.
  
  The RuleBasedLocationResolver allows you to address scenarios where tasks are required to be executed in specific server nodes first. Then, it can fail-over to another set of server nodes if the first (preferred) one is not available.

<taskClientDispatchAddress>: The address to which the remote task server should dispatch the trigger messages. Usually this would be an endpoint to a load balancer.
The <remoteServerAddress>, <remoteServerUsername> and <remoteServerPassword> is used to specify the sever address, user name and password of the remote task handling server. That is, these settings are only applicable when the <taskServerMode> is REMOTE. For example, see how you can use WSO2 Task Server as a remote task server.

The default values in the tasks-config.xml file ensures that minimal changes are required when running in both standalone and clustered modes. The task server mode is set to "AUTO" by default, which automatically detects if clustering is enabled in the server, and by default switches to clustered mode of scheduled tasks. The task server count is set to "2" by default, where in a clustered setup, at least two nodes will be there. This setting basically represents the number of servers that will be waiting for scheduled tasks to be shared between the given number of nodes at startup. For example, if 10 tasks were saved and scheduled earlier, for some reason, later if the cluster is brought down, and then again, when individual servers are coming up, we do not want the first server up to schedule all the tasks. Instead, we will want several servers to come up and share the 10 tasks between them.

Task clustering is based on a peer-to-peer communication mechanism, and when carrying out fail-over scenarios, it can rarely result in split-brain scenarios, where the same task can be scheduled without knowing it is already scheduled somewhere else. So the task implementors should make their best effort to make the task functionality idempotent, or come up with a mechanism to detect if the current task is already running elsewhere.