Deploying WSO2 API Manager with Multiple Datacenters

This topic provides you with instructions on how to set up an active deployment of WSO2 API Manager with multiple datacenters.

Before you begin...

Make sure that you have replicated databases and file systems. For instructions, see Changing the Default API-M Databases.

Deployment Architecture

The following diagram shows the deployment architecture of WSO2 API Manager with multiple datacenters.

Traffic Management

Runtime traffic

A global load balancer is in place to handle API traffic and in this deployment the proposal is to do traffic partitioning based on geography or IP ranges. Based on the LB rules the traffic will simultaneously flow through to both active datacenters, through their datacenter local load balancers. At an event where one datacenter fails, the traffic is routed to the second datacenter where gateway will have data center local high availability.

Management traffic

API creation, throttling policy creation as such management activities are routed towards the designated Active-Master datacenter. Management traffic will only have datacenter local high availability.

Throttling

Throttling data will be published to the traffic managers of both datacenter. Each datacenter will have a local traffic manager for throttling decision making, however for higher accuracy the gateways will publish events to both traffic managers, one sitting locally and one sitting in the other data center. Throttle out event notification will not occur at once in both datacenter as there is no shared traffic manager topology in place (for efficiency reasons), however the deployment will eventually be consistent as the throttle data is cross published.

Analytics

Raw data accumulation will only happen to each datacenter and will not be replicated. The summarization data (STATS_DB) in each datacenter will be accepted bidirectionally.

The exception is where API-M alerting usecase will not work in such a deployment is due to file-based indexing storage.

Configure the datacenters

This section explains how to configure the datacenters with separate databases.

The following diagram shows the deployment

Step 1 - Configure PostgreSQL Databases

In this setup, we use shared Event Store DB, Processed Data Store DB, and Stats DB for two Analytics nodes in one data center. The AM_DB, UM_DB, and REG_DB have also been shared between the API-M node and the two Analytics nodes in the data center.

Step 2 - Configure APIM-Analytics 2.1.0 clustered setup

Configure the two APIM-Analytics nodes clustered setup. Instead of DAS 3.1.0 use API-M Analytics 2.1.0.
When configuring databases, use the PostgreSQL databases configured in the previous step.

If the two API-M Analytics nodes run in the same virtual machine, it is mandatory to have a port offset. Use port offset 1 and 2 for the two Analytics servers.

Step 3 - Configure APIM 2.1.0 with API-M Analytics 2.1.0 clustered setup

Configure APIM 2.1.0 and the two API-M Analytics 2.1.0 nodes. For instructions on how to configure these nodes, see Configuring APIM Analytics.
When configuring databases, use the same set of databases used in Step 2.
Open the <API-M_HOME>/repository/conf/api-manager.xml file, after enabling the Analytics.
Add both the Analytics server URLs under the DASServerURL section as a comma separated list as shown below.
```
<DASServerURL>{tcp://localhost:7612,tcp://localhost:7613}</DASServerURL>
```

Apply the solution to add the data center ID

Before you begin...

Make sure that you have configured the databases according to the instructions in the previous section

To ensure that no primary key violation takes place, you have to change the database schema, by adding the data center ID as an extra column for the tables in STATS_DB, and also add it to the primary key combination. This is to make sure that when database syncing happens, both analytics clusters are able to write to their respective databases without conflicts. There is a custom spark User Defined Function (UDF) to read the data center name from a system property and that has been used whenever inserting data to the STATS_DB via the Spark script.

Follow the steps below to apply the changes for each of the datacenter.

Shut down the APIM 2.1.0 server and the API-M Analytics 2.1.0 servers in the clustered setup.
Add the following parameter to the <Analytics_Home>/repository/conf/analytics/spark/spark-defaults.conf file, in each Analytics server node.
```
spark.executor.extraJavaOptions -Dcarbon.data.center=DC1 
```
Download and replace the analytics-apim.xml file in <Analytics_Home>/repository/conf/template-manager/domain-template/ directory in each Analytics server node.
Download and add the org.wso2.analytics.apim.spark_2.1.0.jar as a patch to each of the API-M Analytics server nodes. This file contains the newly written UDF to get data center ID as system parameter.
Copy and replace the <Analytics_Home>/repository/deployment/server/carbonapps/org_wso2_carbon_analytics_apim-1.0.0.car file with this CApp, for each Analytics server nodes.

Run the following PostgreSQL script against the WSO2AM_STATS_DB.

Expand to see the script...

Alter table API_REQUEST_SUMMARY add column dataCenter varchar(256) NOT NULL DEFAULT 'DefaultDC';
Alter table API_REQUEST_SUMMARY DROP CONSTRAINT API_REQUEST_SUMMARY_pkey;
Alter table API_REQUEST_SUMMARY ADD PRIMARY KEY (api,api_version,version,apiPublisher,consumerKey,userId,context,hostName,year,month,day,dataCenter);

Alter table API_VERSION_USAGE_SUMMARY add column dataCenter varchar(254) NOT NULL DEFAULT 'DefaultDC';
Alter table API_VERSION_USAGE_SUMMARY DROP CONSTRAINT API_VERSION_USAGE_SUMMARY_pkey;
Alter table API_VERSION_USAGE_SUMMARY ADD PRIMARY KEY (api,version,apiPublisher,context,hostName,year,month,day,dataCenter);

Alter table API_Resource_USAGE_SUMMARY add column dataCenter varchar(254) NOT NULL DEFAULT 'DefaultDC';
Alter table API_Resource_USAGE_SUMMARY DROP CONSTRAINT API_Resource_USAGE_SUMMARY_pkey;
Alter table API_Resource_USAGE_SUMMARY ADD PRIMARY KEY (api,version,apiPublisher,consumerKey,context,resourcePath,method,hostName,year,month,day,dataCenter);

Alter table API_RESPONSE_SUMMARY add column dataCenter varchar(254) NOT NULL DEFAULT 'DefaultDC';
Alter table API_RESPONSE_SUMMARY DROP CONSTRAINT API_RESPONSE_SUMMARY_pkey;
Alter table API_RESPONSE_SUMMARY ADD PRIMARY KEY (api_version,apiPublisher,context,hostName,year,month,day,dataCenter);

Alter table API_FAULT_SUMMARY add column dataCenter varchar(254) NOT NULL DEFAULT 'DefaultDC';
Alter table API_FAULT_SUMMARY DROP CONSTRAINT API_FAULT_SUMMARY_pkey;
Alter table API_FAULT_SUMMARY ADD PRIMARY KEY (api,version,apiPublisher,consumerKey,context,hostName,year,month,day,dataCenter);

Alter table API_DESTINATION_SUMMARY add column dataCenter varchar(254) NOT NULL DEFAULT 'DefaultDC';
Alter table API_DESTINATION_SUMMARY DROP CONSTRAINT API_DESTINATION_SUMMARY_pkey;
Alter table API_DESTINATION_SUMMARY ADD PRIMARY KEY (api,version,apiPublisher,context,destination,hostName,year,month,day,dataCenter);

Alter table API_LAST_ACCESS_TIME_SUMMARY add column dataCenter varchar(254) NOT NULL DEFAULT 'DefaultDC';
Alter table API_LAST_ACCESS_TIME_SUMMARY DROP CONSTRAINT API_LAST_ACCESS_TIME_SUMMARY_pkey;
Alter table API_LAST_ACCESS_TIME_SUMMARY ADD PRIMARY KEY (tenantDomain,apiPublisher,api,dataCenter);

Alter table API_EXE_TME_DAY_SUMMARY add column dataCenter varchar(254) NOT NULL DEFAULT 'DefaultDC';
Alter table API_EXE_TME_DAY_SUMMARY DROP CONSTRAINT API_EXE_TME_DAY_SUMMARY_pkey;
Alter table API_EXE_TME_DAY_SUMMARY ADD PRIMARY KEY (api,version,apiPublisher,context,year,month,day,tenantDomain,dataCenter);

Alter table API_EXE_TIME_HOUR_SUMMARY add column dataCenter varchar(254) NOT NULL DEFAULT 'DefaultDC';
Alter table API_EXE_TIME_HOUR_SUMMARY DROP CONSTRAINT API_EXE_TIME_HOUR_SUMMARY_pkey;
Alter table API_EXE_TIME_HOUR_SUMMARY ADD PRIMARY KEY (api,version,tenantDomain,apiPublisher,context,year,month,day,hour,dataCenter);

Alter table API_EXE_TIME_MIN_SUMMARY add column dataCenter varchar(254) NOT NULL DEFAULT 'DefaultDC';
Alter table API_EXE_TIME_MIN_SUMMARY DROP CONSTRAINT API_EXE_TIME_MIN_SUMMARY_pkey;
Alter table API_EXE_TIME_MIN_SUMMARY ADD PRIMARY KEY (api,version,tenantDomain,apiPublisher,context,year,month,day,hour,minutes,dataCenter);

Alter table API_THROTTLED_OUT_SUMMARY add column dataCenter varchar(254) NOT NULL DEFAULT 'DefaultDC';
Alter table API_THROTTLED_OUT_SUMMARY DROP CONSTRAINT API_THROTTLED_OUT_SUMMARY_pkey;
Alter table API_THROTTLED_OUT_SUMMARY ADD PRIMARY KEY (api,api_version,context,apiPublisher,applicationName,tenantDomain,year,month,day,throttledOutReason,dataCenter);

Alter table API_REQ_USER_BROW_SUMMARY add column dataCenter varchar(254) NOT NULL DEFAULT 'DefaultDC';
Alter table API_REQ_USER_BROW_SUMMARY DROP CONSTRAINT API_REQ_USER_BROW_SUMMARY_pkey;
Alter table API_REQ_USER_BROW_SUMMARY ADD PRIMARY KEY (api,version,apiPublisher,year,month,day,os,browser,tenantDomain,dataCenter);

/*Execute following queries only "APIM_GEO_LOCATION_STATS" are enabled fom admin app.
Alter table API_REQ_GEO_LOC_SUMMARY add column dataCenter varchar(254);
Alter table API_REQ_GEO_LOC_SUMMARY drop primary key;
Alter table API_REQ_GEO_LOC_SUMMARY ADD PRIMARY KEY (api,version,apiPublisher,year,month,day,country,city,tenantDomain,dataCenter);

Restart the API-M 2.1.0 server and the API-M Analytics 2.1.0 servers in the clustered setup.

Synchronize the databases

Why do we need to maintain a same data in the STAT_DB of both data centers?

In the active-active data center architecture, the request may come to one of the datacenters and be fulfilled by that datacenter. The analytics-related details of that request will be stored in the STATS_DB of the same data center. Therefore, when requesting for analytics-related details, both datacenters can provide different details according to their STATS_DBs. To avoid this, we need to maintain same set of data in the STATS_DBs of both the data centers.

You can synchronize databases by sharing the STATS_DB or by using a replication mechanism. Inserting the data center ID to the primary key into all the tables in the STATS_DB and include it in the composite key can be done in two methods.

Using a bi-directional replication mechanism - This is a master-master node replication, where changes done in one node will be replicated in other nodes.
Master-slave mechanism - The STATS_DB will be shared among all the nodes. When the master node becomes unavailable, the slave nodes will function as the master node.

Follow the steps below to synchronize the databases using the bi-directional replication(BDR) mechanism.

Note that these instructions are tested with Ubuntu OS and PostgreSQL

Before you begin...

Install and enable the PostgreSQL apt repository for PGDG. This repository is required by the BDR packages.

Create a file with named pgdg.list in /etc/apt/sources.list.d/ and add the following.
```
deb http://apt.postgresql.org/pub/repos/apt/ codename-pgdg main
```
Replace the codename according to your OS. E.g., Ubuntu 14.04 (trusty), 16.04 (xenial), 17.04 (zesty)
Example - for Ubuntu 16.04
```
deb http://apt.postgresql.org/pub/repos/apt/ xenial-pgdg main
```

Create a 2ndquadrant.list file in the /etc/apt/sources.list.d/ with the repository URL given below. Change codename according to your OS version
```
deb http://packages.2ndquadrant.com/bdr/apt/ codename-2ndquadrant main
```

Import the repository key from here. Update the package lists and install the packages.

wget --quiet -O - http://packages.2ndquadrant.com/bdr/apt/AA7A6805.asc | sudo apt-key add -
sudo apt-get update

Remove the postgresql-9.4 packages, if you have them installed already.

BDR requires a patched version of PostgreSQL 9.4 that conflicts with the official packages. If you already have PostgreSQL 9.4 installed either from apt.postgresql.org or your official distribution repository, you will need to make a dump of all your databases, then uninstall the official PostgreSQL 9.4 packages before you install the BDR.
To get the du_dump...
```
pg_dump database1 -f backup_stat_db.sql
```
To remove the postgresql-9.4 packages...
```
sudo apt-get remove postgresql-9.4
```

Install the BDR packages. Sample commands are given below.

sudo apt-get update
sudo apt-get install postgresql-bdr-9.4 postgresql-bdr-9.4-bdr-plugin

Make the following changes to the files in the /etc/postgresql/9.4/main/ directory in both nodes.

postgresql.conf

listen_addresses = '*'
shared_preload_libraries = 'bdr'
wal_level = 'logical'
track_commit_timestamp = on
max_connections = 100
max_wal_senders = 10
max_replication_slots = 10
max_worker_processes = 10

pg_hba.conf

#Add the following configs
hostssl all all x.x.x.x/32 trust    # Own IP address
hostssl all all z.z.z.z/32 trust      # Second node IP address
hostssl replication postgres x.x.x.x/32 trust   # Own IP address
hostssl replication postgres z.z.z.z/32 trust    # Second node IP address

Restart PostgreSQL in both nodes. Sample commands are given below.
```
systemctl unmask postgresql
systemctl restart postgresql
```

Create the STATS_DB database and users.

CREATE DATABASE stat_db;
CREATE ROLE stat_db_user WITH SUPERUSER LOGIN PASSWORD 'SuperPass';
GRANT ALL PRIVILEGES ON DATABASE stat_dbTO stat_db_user;

Create BDR extension on the STATS_DB in both nodes. Sample commands are given below.
```
\c stat_db;
create extension pgcrypto;
create extension btree_gist;
create extension bdr;
```
You can check the BDR extension as follows:
Create the first master node.

Do this step only in Node 1.
Creating the first master node
```
SELECT bdr.bdr_group_create(local_node_name := 'node1', node_external_dsn := 'host=<OWN EXTERNAL IP> port=5432 dbname=stat_db');
```
You can verify this as shown below.

Create the second master node.

Do this step only in Node 2.

Creating the second master node

SELECT bdr.bdr_group_join(local_node_name := 'node2', node_external_dsn := 'host=<OWN EXTERNAL IP> port=5432 dbname= stat_db', join_using_dsn := 'host=<NODE1 EXTERNAL IP> port=5432 dbname= stat_db');

You can verify this with the same command given in the previous step.

Restore database data.

Do this step in only one of the two nodes.
```
psql stat_db < backup_stat_db.sql
```

You have now successfully set up an active multi datacenter deployment.

API Manager 2.1.0

Deploying WSO2 API Manager with Multiple Datacenters

Analytics

Deployment Architecture

Traffic Management

Runtime traffic

Management traffic

Throttling

Analytics

Configure the datacenters

Step 1 - Configure PostgreSQL Databases

Step 2 - Configure APIM-Analytics 2.1.0 clustered setup

Step 3 - Configure APIM 2.1.0 with API-M Analytics 2.1.0 clustered setup

Apply the solution to add the data center ID

Synchronize the databases

Related content