com.atlassian.confluence.content.render.xhtml.migration.exceptions.UnknownMacroMigrationException: The macro 'next_previous_link3' is unknown.

WSO2 DAS Performance Analysis


Event Ingestion with Persistance

HBase Event Store

This test involved setting up a 10-node HBase cluster with HDFS as the undrelying file system.

Versions

  • WSO2 DAS 3.1.0.
  • Apache Hadoop 2.7.2.
  • Apache HBase 1.2.1.
  • Oracle Java Development Kit (JDK) v1.7 update 51 (1.7.0_51-b13).

Infrastructure used

  • 3 DAS nodes (variable roles: publisher, receiver, analyzer and indexer): c4.2xlarge
  • 1 HBase master and Hadoop Namenode: c3.2xlarge
  • 9 HBase Regionservers and Hadoop Datanodes: c3.2xlarge


Scenario: Persisting 1 billion events from the Smart Home DAS Sample

This test was designed to test the data layer during sustained event publication. During testing, the TPS was around the 150K mark, and the memstore flush of the HBase cluster (which suspends all writes) and minor compaction operations brought it down in bursts. Overall, a mean of 96K TPS was achieved, but a steady rate of around 100-150K TPS is achievable, as opposed to the current no-flow-control situation.

The published data took around 950GB on the Hadoop filesystem, taking the HDFS-level replication into account.


 Events1000000000
Time (in seconds)10391.768
Mean TPS96230.01591


Scenario: Persisting the entire Wikipedia corpus

This test involved publishing the entirety of the Wikipedia dataset, where a single event comprises of one Wikipedia article (16.8M articles in total). Events vary greatly in size, with the mean being ~3.5KB. Here, a mean throughput of around 9K TPS was observed.

Events16753779
Time (s)1862.901
Mean TPS8993.381291


Microsoft SQL Server Event Store

Infrastructure used

  • c4.2xlarge Amazon EC2 instances as the DAS node
    • Linux kernel 4.44, java version "1.8.0_131", JVM flags : -Xmx4g -Xms2g
  • db.m4.2xlarge Amazon RDS instance with MS SQL Server Enterprise Edition 2016 as the database node
  • Customized Thrift client as the data publisher (Thrift producer found in samples)

Scenario: Persisting 30 million events of Process Monitoring Events on MS SQL

This test involved persisting process monitoring events each of approximately 180 bytes. The test injected 30 million events into DAS with an input TPS of 40,000 events/second.

MySQL Event Store

Infrastructure used

  • c4.2xlarge Amazon EC2 instances as the DAS node
    • Linux kernel 4.44, java version "1.8.0_131", JVM flags : -Xmx4g -Xms2g
  • db.m4.2xlarge Amazon RDS instance with MySQL Community Edition version 5.7 as the database node
  • Customized Thrift client as the data publisher (Thrift producer found in samples)

Scenario: Persisting 12 million events of Process Monitoring Events on MySQL

This test involved persisting process monitoring events of approximately 180 bytes. The test injected 12 million events into DAS with an input TPS of 10,000 events/second.

MySQL Upper Limit

After around 12 million events are published, a sudden drop can be observed in receiver performance that can be considered as the upper limit of MySQL event store. In order to continue receiving events without a major performance degradation data has to be purged periodically before it reaches the upper limit. See https://docs.wso2.com/display/DAS310/Purging+Data for more information on configuring data purging.

In the event that data purging is not possible an HBase event store should be used.



Batch Analytics

The following topics describe the analyzer performance of WSO2 DAS. 

Scenario: Running Spark queries on the 1 billion smart home published events

Spark queries from the Smart Home DAS sample were executed against the published data, and the analyzer node count was kept at 2 and 3 respectively for 2 separate tests. The SPARK JVMs were provided with following during the test.

  • 1.36 processor cores
  • 12GB of dedicated memory

The following results were observed.

  • Over 1 million TPS on Spark for 2 analyzers
  • About 1.3 million TPS for 3 analyzers.

The DAS GET operations (on HBase) make use of the HBase data locality aspect. This has the potential to perform the GET operations fast compared to random access.

Query2 Analyzers3 Analyzers
Time(s)Mean TPSTime(s)Mean TPS
INSERT OVERWRITE TABLE cityUsage SELECT metro_area, avg(power_reading) AS avg_usage,min(power_reading) AS min_usage, max(power_reading) AS max_usage FROM smartHomeData GROUP BY metro_area958.801042968.20741.151349250.90
INSERT OVERWRITE TABLE ct SELECT count(*) FROM smartHomeData953.461048806.20734.99

1360570.13

INSERT OVERWRITE TABLE peakDeviceUsageRange SELECT house_id, (max(power_reading) - min(power_reading)) AS usage_range FROM smartHomeData WHERE is_peak = true AND metro_area = "Seattle" GROUP BY house_id

975.06

1025581.77751.271331073.47
 
INSERT OVERWRITE TABLE stateAvgUsage SELECT state, avg(power_reading) AS state_avg_usage FROM smartHomeData GROUP BY state
991.081009003.34783.54

1276265.545


Scenario: Running Spark queries on the Wikipedia corpus

Query2 Analyzers3 Analyzers
Time(s)Mean TPSTime(s)Mean TPS
INSERT INTO TABLE wikiAvgArticleLength SELECT AVG(length) as avg_article_length FROM wiki222.7075234.03167.27100164.18
INSERT INTO TABLE wikiTotalArticleLength SELECT SUM(length) as total_article_chars FROM wiki221.7475554.76166.92100373.80
INSERT INTO TABLE wikiTotalArticlePages SELECT COUNT(*) as total_pages FROM wiki221.8075536.05166.14100842.18
INSERT INTO TABLE wikiContributorSummary SELECT contributor_username, COUNT(*) as page_count FROM wiki GROUP BY contributor_username236.1170958.52181.4292350.26


Scenario: Running Spark queries on 1 million smart home and Wikipedia events on MySQL Event Store

Spark analyzing performance (time to complete execution) was measured using a 2 node DAS analyzer cluster with MySQL database.

Time taken for each type of Spark query is given below.

Data setEvent CountQuery TypeTime Taken (seconds)
Smart Home10000000INSERT OVERWRITE TABLE cityUsage SELECT metro_area, avg(power_reading) AS avg_usage, min(power_reading) AS min_usage, max(power_reading) AS max_usage FROM smartHomeData GROUP BY metro_area 26 sec
Smart Home10000000INSERT OVERWRITE TABLE peakDeviceUsageRange SELECT house_id, (max(power_reading) - min(power_reading)) AS usage_range FROM smartHomeData WHERE is_peak = true AND metro_area = "Seattle" GROUP BY house_id 22 sec
Smart Home10000000INSERT OVERWRITE TABLE stateAvgUsage SELECT state, avg(power_reading) AS state_avg_usage FROM smartHomeData21 sec
Smart Home10000000INSERT OVERWRITE TABLE stateUsageDifference SELECT a2.state, (a2.state_avg_usage-a1.overall_avg) AS avg_usage_difference FROM (select avg(state_avg_usage) as overall_avg from stateAvgUsage) as a1 join stateAvgUsage as a2  1 sec
Wikipedia10000000INSERT INTO TABLE wikiAvgArticleLength SELECT AVG(length) as avg_article_length FROM wiki48 min
Wikipedia10000000INSERT INTO TABLE wikiContributorSummary SELECT contributor_username, COUNT(*) as page_count FROM wiki GROUP BY contributor_username1 hour 45 min
Wikipedia10000000INSERT INTO TABLE wikiTotalArticleLength SELECT SUM(length) as total_article_chars FROM wiki44 min
Wikipedia10000000INSERT INTO TABLE wikiTotalArticlePages SELECT COUNT(*) as total_pages FROM wiki1 hour 17 min


Retrieving Results

Scenario: Retrieving Process Monitoring Data via REST API

Infrastructure used

  • c4.xlarge Amazon EC2 instances as the DAS node
    • Linux kernel 4.44, java version "1.8.0_131", JVM flags : -Xmx4g -Xms2g
  • c4.xlarge  Amazon EC2 instances  MySQL Community Edition version 5.7 as the database node

This test was conducted on a test setup as shown in the following figure,


Using JMeter the DAS’s search REST API was invoked. Eighty JMeter users were used and they sent requests in a tight loop. The request was sent to query a single record from an event table via the DAS search API. In this experiment, the MySQL server’s event table had data which had been loaded in previous experiments. The experiment was run for 45 minutes. Average throughput value of 2839 events/second and an average latency of 29 ms was measured at the JMeter.

API invocations per second2839
Average Latency (ms)29
com.atlassian.confluence.content.render.xhtml.migration.exceptions.UnknownMacroMigrationException: The macro 'next_previous_links2' is unknown.