This section summarizes the results of performance tests carried out with the minimum fully distributed DAS deployment setup with RDBMS (MySQL) and HBase event stores separately.
...
Event Ingestion with Persistance
HBase Event Store
This test involved setting up a 10-node HBase cluster with HDFS as the undrelying file system.
Infrastructure used
- 3 DAS nodes (variable roles: publisher, receiver, analyzer and indexer): c4.2xlarge
- 1 HBase master and Hadoop Namenode: c3.2xlarge
- 9 HBase Regionservers and Hadoop Datanodes: c3.2xlarge
Scenario: Persisting 1 billion events from the Smart Home DAS Sample
This test was designed to test the data layer during sustained event publication. During testing, the TPS was around the 150K mark, and the memstore flush of the HBase cluster (which suspends all writes) and minor compaction operations brought it down in bursts. Overall, a mean of 96K TPS was achieved, but a steady rate of around 100-150K TPS as is achievable, as opposed to the current no-flow-control situation.
...
Events | 1000000000 |
Time (in seconds) | 10391.768 |
Mean TPS | 96230.01591 |
...
Scenario: Persisting the entire Wikipedia corpus
This test involved publishing the entirety of the Wikipedia dataset, where a single event comprises of one Wikipedia article (16.8M articles in total). Events vary greatly in size, with the mean being ~3.5KB. Here, a mean throughput of around 9K TPS was observed.
...
Events | 16753779 |
Time (s) | 1862.901 |
Mean TPS | 8993.381291 |
MSSQL Event Store
Scenario: Persisting 30 million events of Process Monitoring Events on MSSQL
Infrastructure used
- c4.2xlarge Amazon EC2 instances as the DAS nodes
- One DAS node as the publisher
- A db.m4.2xlarge MSSQL RDS instance as the database node
Receiver node Data Persistence Performance
MySQL Event Store
Scenario: Persisting 12 million events of Process Monitoring Events on MSSQL
Batch Analytics
Scenario: Running Spark queries on the 1 billion published events
Spark queries from the Smart Home DAS sample were executed against the published data, and the analyzer node count was kept at 2 and 3 respectively for 2 separate tests. The SPARK JVMs were provided with following during the test.
...
Query | 2 Analyzers | 3 Analyzers | ||
---|---|---|---|---|
Time(s) | Mean TPS | Time(s) | Mean TPS | |
INSERT OVERWRITE TABLE cityUsage SELECT metro_area, avg(power_reading) AS avg_usage,min(power_reading) AS min_usage, max(power_reading) AS max_usage FROM smartHomeData GROUP BY metro_area | 958.80 | 1042968.20 | 741.15 | 1349250.90 |
INSERT OVERWRITE TABLE ct SELECT count(*) FROM smartHomeData | 953.46 | 1048806.20 | 734.99 | 1360570.13 |
INSERT OVERWRITE TABLE peakDeviceUsageRange SELECT house_id, (max(power_reading) - min(power_reading)) AS usage_range FROM smartHomeData WHERE is_peak = true AND metro_area = "Seattle" GROUP BY house_id | 975.06 | 1025581.77 | 751.27 | 1331073.47 |
INSERT OVERWRITE TABLE stateAvgUsage SELECT state, avg(power_reading) AS state_avg_usage FROM smartHomeData GROUP BY state | 991.08 | 1009003.34 | 783.54 | 1276265.545 |
Scenario: Running Spark queries on the Wikipedia corpus
Query | 2 Analyzers | 3 Analyzers | ||
---|---|---|---|---|
Time(s) | Mean TPS | Time(s) | Mean TPS | |
INSERT INTO TABLE wikiAvgArticleLength SELECT AVG(length) as avg_article_length FROM wiki | 222.70 | 75234.03 | 167.27 | 100164.18 |
INSERT INTO TABLE wikiTotalArticleLength SELECT SUM(length) as total_article_chars FROM wiki | 221.74 | 75554.76 | 166.92 | 100373.80 |
INSERT INTO TABLE wikiTotalArticlePages SELECT COUNT(*) as total_pages FROM wiki | 221.80 | 75536.05 | 166.14 | 100842.18 |
INSERT INTO TABLE wikiContributorSummary SELECT contributor_username, COUNT(*) as page_count FROM wiki GROUP BY contributor_username | 236.11 | 70958.52 | 181.42 | 92350.26 |
DAS Performance Test Round
...
3: RDBMS (MySQL)
Infrastructure used
- c4.2xlarge Amazon EC2 instances as the DAS nodes
- One DAS node as the publisher
- A c3.2xlarge Amazon instance as the database node
...
Sample | Number of Events | Mean Event Rate |
---|---|---|
Smart Home sample | 100000000 | 5741 events per second |
Wikipedia sample | 15901127 | 4438 events per second |
...
Processing Performance
The following topics describe the analyzer performance of WSO2 DAS.
...
Mode | Dataset | shardIndexRecordBatchSize | Replication Factor | Event Count | Time Taken (seconds) | Average TPS |
---|---|---|---|---|---|---|
Standalone | Wikipedia | 10MB | NA | 15901127 | 7975 | 1993.871724 |
Standalone | Wikipedia | 20MB | NA | 15901127 | 6765 | 2350.499187 |
Standalone | Smart Home | 20MB | NA | 20000000 | 1385 | 14440.43321 |
Minimum Fully Distributed | Wikipedia | 20MB | 1 | 15901127 | 6870 | 2314.574527 |
Minimum Fully Distributed | Wikipedia | 20MB | 0 | 15901127 | 7280 | 2184.220742 |
...
Retrieving Results
Scenario: Retrieving Process Monitoring Data via REST API
This test was conducted on a test setup as shown in the following figure,
Infrastructure used
- JMeter, DAS 3.1.0, MySQL: c4.xlarge (4 vCPUs, 7.5 GB, EBS-Only, 750 Mbps network)
- Linux kernel 4.44, java version "1.8.0_131", JVM flags : -Xmx4g -Xms2g, MySQL version 5.7
...