Deploying WSO2 API Manager with Multiple Datacenters
This topic provides you with instructions on how to set up an active deployment of WSO2 API Manager with multiple datacenters.
Before you begin...
Make sure that you have replicated databases and file systems. For instructions, see Changing the Default API-M Databases.
Deployment Architecture
The following diagram shows the deployment architecture of WSO2 API Manager with multiple datacenters.
Traffic Management
Runtime traffic
A global load balancer is in place to handle API traffic and in this deployment the proposal is to do traffic partitioning based on geography or IP ranges. Based on the LB rules the traffic will simultaneously flow through to both active datacenters, through their datacenter local load balancers. At an event where one datacenter fails, the traffic is routed to the second datacenter where gateway will have data center local high availability.
Management traffic
API creation, throttling policy creation as such management activities are routed towards the designated Active-Master datacenter. Management traffic will only have datacenter local high availability.
Throttling
Throttling data will be published to the traffic managers of both datacenter. Each datacenter will have a local traffic manager for throttling decision making, however for higher accuracy the gateways will publish events to both traffic managers, one sitting locally and one sitting in the other data center. Throttle out event notification will not occur at once in both datacenter as there is no shared traffic manager topology in place (for efficiency reasons), however the deployment will eventually be consistent as the throttle data is cross published.
Analytics
Raw data accumulation will only happen to each datacenter and will not be replicated. The summarization data (STATS_DB) in each datacenter will be accepted bidirectionally.
The exception is where API-M alerting usecase will not work in such a deployment is due to file-based indexing storage.
Configure the datacenters
This section explains how to configure the datacenters with separate databases.
The following diagram shows the deployment
Step 1 - Configure PostgreSQL Databases
In this setup, we use shared Event Store DB, Processed Data Store DB, and Stats DB for two Analytics nodes in one data center. The AM_DB, UM_DB, and REG_DB have also been shared between the API-M node and the two Analytics nodes in the data center.
Step 2 - Configure APIM-Analytics 2.1.0 clustered setup
- Configure the two APIM-Analytics nodes clustered setup. Instead of DAS 3.1.0 use API-M Analytics 2.1.0.
When configuring databases, use the PostgreSQL databases configured in the previous step.
If the two API-M Analytics nodes run in the same virtual machine, it is mandatory to have a port offset. Use port offset 1 and 2 for the two Analytics servers.
Step 3 - Configure APIM 2.1.0 with API-M Analytics 2.1.0 clustered setup
- Configure APIM 2.1.0 and the two API-M Analytics 2.1.0 nodes. For instructions on how to configure these nodes, see Configuring APIM Analytics.
- When configuring databases, use the same set of databases used in Step 2.
- Open the
<API-M_HOME>/repository/conf/api-manager.xml
file, after enabling the Analytics. Add both the Analytics server URLs under the
DASServerURL
section as a comma separated list as shown below.<DASServerURL>{tcp://localhost:7612,tcp://localhost:7613}</DASServerURL>
Apply the solution to add the data center ID
Before you begin...
Make sure that you have configured the databases according to the instructions in the previous section
To ensure that no primary key violation takes place, you have to change the database schema, by adding the data center ID as an extra column for the tables in STATS_DB, and also add it to the primary key combination. This is to make sure that when database syncing happens, both analytics clusters are able to write to their respective databases without conflicts. There is a custom spark User Defined Function (UDF) to read the data center name from a system property and that has been used whenever inserting data to the STATS_DB via the Spark script.
Follow the steps below to apply the changes for each of the datacenter.
- Shut down the APIM 2.1.0 server and the API-M Analytics 2.1.0 servers in the clustered setup.
Add the following parameter to the
<Analytics_Home>/repository/conf/analytics/spark/spark-defaults.conf
file, in each Analytics server node.spark.executor.extraJavaOptions -Dcarbon.data.center=DC1
- Download and replace the
analytics-apim.xml
file in<Analytics_Home>/repository/conf/template-manager/domain-template/
directory in each Analytics server node. - Download and add the
org.wso2.analytics.apim.spark_2.1.0.jar
as a patch to each of the API-M Analytics server nodes. This file contains the newly written UDF to get data center ID as system parameter. - Copy and replace the
<Analytics_Home>/repository/deployment/server/carbonapps/org_wso2_carbon_analytics_apim-1.0.0.car
file with this CApp, for each Analytics server nodes. Run the following PostgreSQL script against the WSO2AM_STATS_DB.
- Restart the API-M 2.1.0 server and the API-M Analytics 2.1.0 servers in the clustered setup.
Synchronize the databases
Why do we need to maintain a same data in the STAT_DB of both data centers?
In the active-active data center architecture, the request may come to one of the datacenters and be fulfilled by that datacenter. The analytics-related details of that request will be stored in the STATS_DB of the same data center. Therefore, when requesting for analytics-related details, both datacenters can provide different details according to their STATS_DBs. To avoid this, we need to maintain same set of data in the STATS_DBs of both the data centers.
You can synchronize databases by sharing the STATS_DB or by using a replication mechanism. Inserting the data center ID to the primary key into all the tables in the STATS_DB and include it in the composite key can be done in two methods.
- Using a bi-directional replication mechanism - This is a master-master node replication, where changes done in one node will be replicated in other nodes.
- Master-slave mechanism - The STATS_DB will be shared among all the nodes. When the master node becomes unavailable, the slave nodes will function as the master node.
Follow the steps below to synchronize the databases using the bi-directional replication(BDR) mechanism.
Note that these instructions are tested with Ubuntu OS and PostgreSQL
Before you begin...
Install and enable the PostgreSQL apt repository for PGDG. This repository is required by the BDR packages.
Create a file with named
pgdg.list
in/etc/apt/sources.list.d/
and add the following.deb http://apt.postgresql.org/pub/repos/apt/ codename-pgdg main
Replace the codename according to your OS. E.g., Ubuntu 14.04 (trusty), 16.04 (xenial), 17.04 (zesty)
Example - for Ubuntu 16.04deb http://apt.postgresql.org/pub/repos/apt/ xenial-pgdg main
Create a 2ndquadrant.list file in the
/etc/apt/sources.list.d/
with the repository URL given below. Change codename according to your OS versiondeb http://packages.2ndquadrant.com/bdr/apt/ codename-2ndquadrant main
Import the repository key from here. Update the package lists and install the packages.
wget --quiet -O - http://packages.2ndquadrant.com/bdr/apt/AA7A6805.asc | sudo apt-key add - sudo apt-get update
Remove the
postgresql-9.4
packages, if you have them installed already.BDR requires a patched version of PostgreSQL 9.4 that conflicts with the official packages. If you already have PostgreSQL 9.4 installed either from apt.postgresql.org or your official distribution repository, you will need to make a dump of all your databases, then uninstall the official PostgreSQL 9.4 packages before you install the BDR.
To get the du_dump...pg_dump database1 -f backup_stat_db.sql
To remove the postgresql-9.4 packages...sudo apt-get remove postgresql-9.4
Install the BDR packages. Sample commands are given below.
sudo apt-get update sudo apt-get install postgresql-bdr-9.4 postgresql-bdr-9.4-bdr-plugin
Make the following changes to the files in the /etc/postgresql/9.4/main/ directory in both nodes.
postgresql.conflisten_addresses = '*' shared_preload_libraries = 'bdr' wal_level = 'logical' track_commit_timestamp = on max_connections = 100 max_wal_senders = 10 max_replication_slots = 10 max_worker_processes = 10
pg_hba.conf#Add the following configs hostssl all all x.x.x.x/32 trust # Own IP address hostssl all all z.z.z.z/32 trust # Second node IP address hostssl replication postgres x.x.x.x/32 trust # Own IP address hostssl replication postgres z.z.z.z/32 trust # Second node IP address
Restart PostgreSQL in both nodes. Sample commands are given below.
systemctl unmask postgresql systemctl restart postgresql
Create the STATS_DB database and users.
CREATE DATABASE stat_db; CREATE ROLE stat_db_user WITH SUPERUSER LOGIN PASSWORD 'SuperPass'; GRANT ALL PRIVILEGES ON DATABASE stat_dbTO stat_db_user;
Create BDR extension on the STATS_DB in both nodes. Sample commands are given below.
\c stat_db; create extension pgcrypto; create extension btree_gist; create extension bdr;
You can check the BDR extension as follows:
Create the first master node.
Do this step only in Node 1.
Creating the first master nodeSELECT bdr.bdr_group_create(local_node_name := 'node1', node_external_dsn := 'host=<OWN EXTERNAL IP> port=5432 dbname=stat_db');
You can verify this as shown below.
Create the second master node.
Do this step only in Node 2.
Creating the second master nodeSELECT bdr.bdr_group_join(local_node_name := 'node2', node_external_dsn := 'host=<OWN EXTERNAL IP> port=5432 dbname= stat_db', join_using_dsn := 'host=<NODE1 EXTERNAL IP> port=5432 dbname= stat_db');
You can verify this with the same command given in the previous step.
Restore database data.
Do this step in only one of the two nodes.
psql stat_db < backup_stat_db.sql
You have now successfully set up an active multi datacenter deployment.