Configuring Multiple Cassandra Environments
In an enterprise system, you would typically have multiple environments, such as Development, QA, Production etc. Therefore, you can have scenarios where multiple Cassandra clusters are used in each of the separate environments. In such scenarios, WSO2 Storage Server can be configured to manage all the Cassandra clusters in all environments as depicted by the following diagram:
When you have multiple environments with multiple Cassandra clusters, you can configure them using the cassandra-environments.xml file stored in the <SS_HOME>/repository/conf/etc/ directory. This file should be configured with details of all the environments and clusters that should be managed by your Storage Server. You should also configure datasources for each cluster in the master-datasources.xml file stored in <SS_HOME>/repository/conf/datasources/ directory.
Embedded Cassandra and External Cassandra Nodes:
When a Cassandra cluster is set up, you have the option of using a Storage Server instance as a cluster node, because Cassandra is by default embedded in WSO2 Storage Server. Otherwise, you have to use external Cassandra nodes. Note that all the cluster nodes in a particular environment should be either embedded Cassandra instances or external Cassandra nodes for the configurations in the cassandra-environments.xml file to be effective.
The following topics explain how to set up the Cassandra clusters in your environments, and how to configure the environments:
Setting up Cassandra clusters
A Cassandra cluster is a collection of Cassandra nodes. When you use WSO2 Storage Server, you have the option of setting up Cassandra clusters for an environment using two methods:
- Using SS deployments as Cassandra nodes: WSO2 Storage Server is shipped with an embedded Cassandra instance. This gives us the option of using SS deployments as the Cassandra nodes in a cluster. In this scenario, you will be using one SS deployment as the provisioning SS instance, which will have the UI enabled for users to log in. All the other SS deployments in the system will be used as back-end Cassandra nodes connected to one another, thereby forming the Cassandra cluster.
- Using external Cassandra installations as Cassandra nodes: You can install external (Vanilla) Cassandra nodes for your clusters as explained here.
Note that all the Cassandra clusters in one environment should be of the same type for this configuration to work.
Configuring the cassandra-environments.xml file
After you set up all the clusters, you can partition them into separate environments by using the cassandra-environments.xml file. This file should be updated with the information of all environments in your system, including the default environment. The elements in this file are as follows:
You should point to the default environment from the cassandra-environments.xml as shown below. Note that the default environment should be separately configured as explained here.
<Environment> <Name>DEFAULT</Name> <IsExternal>false</IsExternal> </Environment>
- The <Environment> section is used to add the details of environments. You can add any number of environments using this element.
The <IsExternal> element is used to specify whether the clusters in an environment consist of embedded Cassandra nodes or external Cassandra nodes. Set this element to 'true' or 'false'. If this value is set to 'true', you can only have external Cassandra nodes in the environment, and if the value is 'false', you can only have embedded Cassandra nodes as explained above.
- The <Clusters> element contains the details of the clusters within the environment. You can add any number of clusters using the <Cluster> element under <Clusters>.
- The <DataSourceJndiName> element specifies the datasource information corresponding to a Cassandra cluster. Therefore, the datasources for each cluster should be defined in the master-datasources.xml file stored in the <SS_HOME>/repository/conf/datasources/ directory. See how datasources are configured for Cassandra clusters.
Shown below is the default configuration in the cassandra-environments.xml file.
<CassandraEnvironmentConfig> <CassandraEnvironments> <Environment> <Name>DEFAULT</Name> <IsExternal>false</IsExternal> </Environment> <Environment> <Name>DEV</Name> <IsExternal>true</IsExternal> <Clusters> <Cluster> <Name>DevCluster</Name> <DataSourceJndiName>DevDS</DataSourceJndiName> </Cluster> <Cluster> <Name>ProdCluster</Name> <DataSourceJndiName>ProdDS</DataSourceJndiName> </Cluster> </Clusters> </Environment> </CassandraEnvironments> </CassandraEnvironmentConfig>
Configuring datasources for Cassandra clusters
Datasources should be defined for all Cassandra clusters (except for the default Cassandra cluster), in order to allow clients to access the cluster. In the datasource configuration (master-datasources.xml
file) you can allow Hector-based clients as well as CQL-based clients to access a cluster by enabling the relevant datasource reader. Developers can refer to the datasource object from JNDi lookup.
Note that WSO2 Storage Server is a Hector-based client. Therefore, the Hector datasource reader should always be enabled in the datasource configuration to allow the provisioning storage server in your system to access the Cassandra cluster.
The instructions for defining datasources are explained below.
- Open the master-datasources.xml file from the <SS_HOME>/repository/conf/ directory.
Enable the relevant datasource readers (Hector and CQL) in the file. The default configurations in the master-datasources.xml file for Hector and CQL are as follows:
Note that the DataSourceJndiName given for the cluster in the cassandra-environments.xml file should be the same as the "jndiConfig/name" in master-datasources.xml.
Enable the following for the Hector datasource reader:
<!-- Hector datasource --> <provider>org.wso2.carbon.cassandra.datareader.hector.HectorBasedDataSourceReader</provider> <!--datasource> <name>HectorDS</name> <description>The datasource used for RSS metadata repository</description> <jndiConfig> <name>CassandraRepo</name> </jndiConfig> <definition type="HECTOR"> <configuration> <hosts>10.100.101.3:9160,10.100.101.4:9160,10.100.101.5:9160</hosts> <username>admin</username> <password>admin</password> <clusterName>TestCluster</clusterName> <maxActive>200</maxActive> <enableSecurity>false</enableSecurity> </configuration> </definition> </datasource-->
Note the following:
<jndiConfig>/<name> should be the same name given as the DataSourceJNDiName for the cluster in the
master-datasources.xml
file.The IP addresses of the Cassandra nodes should be listed using the
<hosts>
element.A separate datasource should be defined for each cluster by duplicating the
<datasource>
section shown above.
Enable the following for CQL datasource reader:
<datasources> <!-- external Cassandra data source. please enable either one of datasource (CQL or Hector) based on your preference --> <!-- CQL datasource --> <provider>org.wso2.carbon.cassandra.datareader.cql.CassandraDataSourceReader</provider> <!--datasource> <name>WSO2_CASSANDRA_DB</name> <description>The datasource used for cassandra</description> <jndiConfig> <name>CassandraRepo</name> </jndiConfig> <definition type="CASSANDRA"> <configuration> <async>false</async> <clusterName>TestCluster</clusterName> <compression>SNAPPY</compression> <concurrency>100</concurrency> <username>admin</username> <password encrypted="true">admin</password> <port>9042</port> <maxConnections>100</maxConnections> <hosts> <host>127.0.0.1</host> </hosts> <loadBalancePolicy> <exclusionThreshold>2.5</exclusionThreshold> <latencyAware>true</latencyAware> <minMeasure>100</minMeasure> <policyName>RoundRobinPolicy</policyName> <retryPeriod>10</retryPeriod> <scale>2</scale> </loadBalancePolicy> <poolOptions> <coreConnectionsForLocal>8</coreConnectionsForLocal> <coreConnectionsForRemote>2</coreConnectionsForRemote> <maxConnectionsForLocal>10</maxConnectionsForLocal> <maxConnectionsForRemote>10</maxConnectionsForRemote> <maxSimultaneousRequestsForLocal>10</maxSimultaneousRequestsForLocal> <maxSimultaneousRequestsForRemote>10</maxSimultaneousRequestsForRemote> <minSimultaneousRequestsForLocal>10</minSimultaneousRequestsForLocal> <minSimultaneousRequestsForRemote>10</minSimultaneousRequestsForRemote> </poolOptions> <reconnectPolicy> <baseDelayMs>100</baseDelayMs> <policyName>ConstantReconnectionPolicy</policyName> </reconnectPolicy> <socketOptions> <connectTimeoutMillis>10000</connectTimeoutMillis> <keepAlive>true</keepAlive> <readTimeoutMillis>15000</readTimeoutMillis> <tcpNoDelay>true</tcpNoDelay> </socketOptions> </configuration> </definition> </datasource-->
Note the following:
<jndiConfig>/<name> should be the same name given as the DataSourceJNDiName for the cluster in the
master-datasources.xml
file.The IP addresses of the Cassandra nodes should be listed using the
<hosts>
element.A separate datasource should be defined for each cluster by duplicating the
<datasource>
section shown above.