This section summarizes the results of performance tests carried out with the minimum fully distributed DAS deployment setup with RDBMS (MySQL) and HBase event stores separately.
Event Ingestion with Persistance
HBase Event Store
This test involved setting up a 10-node HBase cluster with HDFS as the undrelying file system.
Infrastructure used
- 3 DAS nodes (variable roles: publisher, receiver, analyzer and indexer): c4.2xlarge
- 1 HBase master and Hadoop Namenode: c3.2xlarge
- 9 HBase Regionservers and Hadoop Datanodes: c3.2xlarge
Scenario: Persisting 1 billion events from the Smart Home DAS Sample
This test was designed to test the data layer during sustained event publication. During testing, the TPS was around the 150K mark, and the memstore flush of the HBase cluster (which suspends all writes) and minor compaction operations brought it down in bursts. Overall, a mean of 96K TPS was achieved, but a steady rate of around 100-150K TPS as is achievable, as opposed to the current no-flow-control situation.
The published data took around 950GB on the Hadoop filesystem, taking the HDFS-level replication into account.
Events | 1000000000 |
Time (in seconds) | 10391.768 |
Mean TPS | 96230.01591 |
Scenario: Persisting the entire Wikipedia corpus
This test involved publishing the entirety of the Wikipedia dataset, where a single event comprises of one Wikipedia article (16.8M articles in total). Events vary greatly in size, with the mean being ~3.5KB. Here, a mean throughput of around 9K TPS was observed.
Events | 16753779 |
Time (s) | 1862.901 |
Mean TPS | 8993.381291 |
MS SQL Event Store
Infrastructure used
- c4.2xlarge Amazon EC2 instances as the DAS nodes
- db.m4.2xlarge Amazon RDS instance as the database node
- Customized Thrift client as the data publisher (Thrift producer found in samples)
Scenario: Persisting 30 million events of Process Monitoring Events on MS SQL
This test involved persisting process monitoring events each of approximately 180 bytes. The test injected 30 million events into DAS with an input TPS of 40,000 events/second.
MySQL Event Store
Infrastructure used
- c4.2xlarge Amazon EC2 instances as the DAS nodes
- db.m4.2xlarge Amazon RDS instance as the database node
- Customized Thrift client as the data publisher (Thrift producer found in samples)
Scenario: Persisting 12 million events of Process Monitoring Events on MS SQL
This test involved persisting process monitoring events of approximately 180 bytes. The test injected 12 million events into DAS with an input TPS of 10,000 events/second.
MySQL Upper Limit
After around 12 million events are published, a sudden drop can be observed in receiver performance that can be considered as the break point of MySQL event store. Another type of event store such as HBase event store should be used when the receiver performance has to be maintained unchanged.
Batch Analytics
Scenario: Running Spark queries on the 1 billion published events
Spark queries from the Smart Home DAS sample were executed against the published data, and the analyzer node count was kept at 2 and 3 respectively for 2 separate tests. The SPARK JVMs were provided with following during the test.
- 1.36 processor cores
- 12GB of dedicated memory
The following results were observed.
- Over 1 million TPS on Spark for 2 analyzers
- About 1.3 million TPS for 3 analysers.
The DAS GET
operations (on HBase) make use of the HBase data locality aspect. This has the potential to perform the GET
operations fast compared to random access.
Query | 2 Analyzers | 3 Analyzers | ||
---|---|---|---|---|
Time(s) | Mean TPS | Time(s) | Mean TPS | |
INSERT OVERWRITE TABLE cityUsage SELECT metro_area, avg(power_reading) AS avg_usage,min(power_reading) AS min_usage, max(power_reading) AS max_usage FROM smartHomeData GROUP BY metro_area | 958.80 | 1042968.20 | 741.15 | 1349250.90 |
INSERT OVERWRITE TABLE ct SELECT count(*) FROM smartHomeData | 953.46 | 1048806.20 | 734.99 | 1360570.13 |
INSERT OVERWRITE TABLE peakDeviceUsageRange SELECT house_id,
(max(power_reading) - min(power_reading)) AS usage_range FROM
smartHomeData WHERE is_peak = true AND metro_area = "Seattle" GROUP BY
house_id | 975.06 | 1025581.77 | 751.27 | 1331073.47 |
INSERT OVERWRITE TABLE stateAvgUsage SELECT state, avg(power_reading) AS state_avg_usage FROM smartHomeData GROUP BY state | 991.08 | 1009003.34 | 783.54 | 1276265.545 |
Scenario: Running Spark queries on the Wikipedia corpus
Query | 2 Analyzers | 3 Analyzers | ||
---|---|---|---|---|
Time(s) | Mean TPS | Time(s) | Mean TPS | |
INSERT INTO TABLE wikiAvgArticleLength SELECT AVG(length) as avg_article_length FROM wiki | 222.70 | 75234.03 | 167.27 | 100164.18 |
INSERT INTO TABLE wikiTotalArticleLength SELECT SUM(length) as total_article_chars FROM wiki | 221.74 | 75554.76 | 166.92 | 100373.80 |
INSERT INTO TABLE wikiTotalArticlePages SELECT COUNT(*) as total_pages FROM wiki | 221.80 | 75536.05 | 166.14 | 100842.18 |
INSERT INTO TABLE wikiContributorSummary SELECT contributor_username,
COUNT(*) as page_count FROM wiki GROUP BY contributor_username | 236.11 | 70958.52 | 181.42 | 92350.26 |
DAS Performance Test Round 3: RDBMS (MySQL)
Receiver node Data Persistence Performance
A reduction in the throughput is observed as shown below after 1200000 events in both DAS receiver nodes. This reduction is caused by limitations of MySQL. The receiver performance variation of the second node of the 2 node receiver cluster is as given below. As the initial buffer filling in receiver queues give very high receiver performance at the beginning of the event publishing, event rate after the first 1200000 events was considered for the following graph.
With MySQL RDBMS event store
DAS data persistence was measured by publishing to 2 load balanced receiver nodes with MySQL database.
Sample | Number of Events | Mean Event Rate |
---|---|---|
Smart Home sample | 100000000 | 5741 events per second |
Wikipedia sample | 15901127 | 4438 events per second |
Processing Performance
The following topics describe the analyzer performance of WSO2 DAS.
With MySQL RDBMS event store
Spark analyzing performance (time to complete execution) was measured using a 2 node DAS analyzer cluster with MySQL database.
Time taken for each type of Spark query is given below.
Data set | Event Count | Query Type | Time Taken (seconds) |
---|---|---|---|
Smart Home | 10000000 | INSERT OVERWRITE TABLE cityUsage SELECT metro_area, avg(power_reading) AS avg_usage, min(power_reading) AS min_usage, max(power_reading) AS max_usage FROM smartHomeData GROUP BY metro_area | 26 sec |
Smart Home | 10000000 | INSERT OVERWRITE TABLE peakDeviceUsageRange SELECT house_id, (max(power_reading) - min(power_reading)) AS usage_range FROM smartHomeData WHERE is_peak = true AND metro_area = "Seattle" GROUP BY house_id | 22 sec |
Smart Home | 10000000 | INSERT OVERWRITE TABLE stateAvgUsage SELECT state, avg(power_reading) AS state_avg_usage FROM smartHomeData | 21 sec |
Smart Home | 10000000 | INSERT OVERWRITE TABLE stateUsageDifference SELECT a2.state, (a2.state_avg_usage-a1.overall_avg) AS avg_usage_difference FROM (select avg(state_avg_usage) as overall_avg from stateAvgUsage) as a1 join stateAvgUsage as a2 |
1 sec
|
Wikipedia | 10000000 | INSERT INTO TABLE wikiAvgArticleLength SELECT AVG(length) as avg_article_length FROM wiki | 48 min |
Wikipedia | 10000000 | INSERT INTO TABLE wikiContributorSummary SELECT contributor_username, COUNT(*) as page_count FROM wiki GROUP BY contributor_username | 1 hour 45 min |
Wikipedia | 10000000 | INSERT INTO TABLE wikiTotalArticleLength SELECT SUM(length) as total_article_chars FROM wiki | 44 min |
Wikipedia | 10000000 | INSERT INTO TABLE wikiTotalArticlePages SELECT COUNT(*) as total_pages FROM wiki | 1 hour 17 min |
Single Node Local Clustered Setup Statistics
A fully distributed setup was tested locally with multiple JVMs, and with the following hardware infrastructure specifications.
Machine type: Laptop
RAM: 8GB
Processor: Intel(R) Core(TM) i7-3520M
Storage: Samsung SSD 850
The setup is a 2 node analyzer cluster with a MySQL database as the event store. Analyzer statistics (i.e., the time duration for each execution of the query) are given below.
Dataset | Event Count | Query Type | Time Taken |
---|---|---|---|
Wikipedia | 15901127 | INSERT INTO TABLE wikiContributorSummary SELECT contributor_username, COUNT(*) as page_count FROM wiki GROUP BY contributor_username | 25 min |
Wikipedia | 15901127 | INSERT INTO TABLE wikiTotalArticleLength SELECT SUM(length) as total_article_chars FROM wiki | 25 min |
Wikipedia | 15901127 | INSERT INTO TABLE wikiTotalArticlePages SELECT COUNT(*) as total_pages FROM wiki | 25 min |
Wikipedia | 15901127 | INSERT INTO TABLE wikiAvgArticleLength SELECT AVG(length) as avg_article_length FROM wiki | 25 min |
It was observed that the performance here is comparatively higher (taking into account that the setup consists of a single machine). This is mainly due to the DAS server and MySQL existing locally, and having no physical network I/O delays as a result. This allows the queries to be executed in an optimal manner.
Indexing Performance
In the following table, the shardIndexRecordBatchSize
indicates the amount of index data (in bytes) to be processed at a time by a shard index worker.
Mode | Dataset | shardIndexRecordBatchSize | Replication Factor | Event Count | Time Taken (seconds) | Average TPS |
---|---|---|---|---|---|---|
Standalone | Wikipedia | 10MB | NA | 15901127 | 7975 | 1993.871724 |
Standalone | Wikipedia | 20MB | NA | 15901127 | 6765 | 2350.499187 |
Standalone | Smart Home | 20MB | NA | 20000000 | 1385 | 14440.43321 |
Minimum Fully Distributed | Wikipedia | 20MB | 1 | 15901127 | 6870 | 2314.574527 |
Minimum Fully Distributed | Wikipedia | 20MB | 0 | 15901127 | 7280 | 2184.220742 |
Retrieving Results
Scenario: Retrieving Process Monitoring Data via REST API
This test was conducted on a test setup as shown in the following figure,
Infrastructure used
- JMeter, DAS 3.1.0, MySQL: c4.xlarge (4 vCPUs, 7.5 GB, EBS-Only, 750 Mbps network)
- Linux kernel 4.44, java version "1.8.0_131", JVM flags : -Xmx4g -Xms2g, MySQL version 5.7
Using JMeter the DAS’s search REST API was invoked. Eighty JMeter users were used and they sent requests in a tight loop. The request was sent to query a single record from an event table via the DAS search API. In this experiment the MySQL server’s event table had data which had been loaded in previous experiments. The experiment was run for 45 minutes. Average throughput value of 2,695 events/second and an average latency of 29 ms was measured at the JMeter.