This section summarizes the results of performance tests carried out with the minimum fully distributed DAS deployment setup with RDBMS (MySQL) and HBase event stores separately.
DAS Performance Test Round 1: RDBMS
Infrastructure used
- c4.2xlarge Amazon EC2 instances as the DAS nodes
- One DAS node as the publisher
- A c3.2xlarge Amazon instance as the database node
...
Data set | Event Count | Query Type | Time Taken (seconds) |
---|---|---|---|
Smart Home | 10000000 | INSERT OVERWRITE TABLE cityUsage SELECT metro_area, avg(power_reading) AS avg_usage, min(power_reading) AS min_usage, max(power_reading) AS max_usage FROM smartHomeData GROUP BY metro_area | 26 sec |
Smart Home | 10000000 | INSERT OVERWRITE TABLE peakDeviceUsageRange SELECT house_id, (max(power_reading) - min(power_reading)) AS usage_range FROM smartHomeData WHERE is_peak = true AND metro_area = "Seattle" GROUP BY house_id | 22 sec |
Smart Home | 10000000 | INSERT OVERWRITE TABLE stateAvgUsage SELECT state, avg(power_reading) AS state_avg_usage FROM smartHomeData | 21 sec |
Smart Home | 10000000 | INSERT OVERWRITE TABLE stateUsageDifference SELECT a2.state, (a2.state_avg_usage-a1.overall_avg) AS avg_usage_difference FROM (select avg(state_avg_usage) as overall_avg from stateAvgUsage) as a1 join stateAvgUsage as a2 | 1 sec |
Wikipedia | 10000000 | INSERT INTO TABLE wikiAvgArticleLength SELECT AVG(length) as avg_article_length FROM wiki | 48 min |
Wikipedia | 10000000 | INSERT INTO TABLE wikiContributorSummary SELECT contributor_username, COUNT(*) as page_count FROM wiki GROUP BY contributor_username | 1 hour 45 min |
Wikipedia | 10000000 | INSERT INTO TABLE wikiTotalArticleLength SELECT SUM(length) as total_article_chars FROM wiki | 44 min |
Wikipedia | 10000000 | INSERT INTO TABLE wikiTotalArticlePages SELECT COUNT(*) as total_pages FROM wiki | 1 hour 17 min |
DAS Performance Test Round 2: HBase Cluster
This test involved setting up a 10-node HBase cluster with HDFS as the undrelying file system.
...
Query | 2 Analyzers | 3 Analyzers | ||
---|---|---|---|---|
Time(s) | Mean TPS | Time(s) | Mean TPS | |
INSERT INTO TABLE wikiAvgArticleLength SELECT AVG(length) as avg_article_length FROM wiki | 222.70 | 75234.03 | 167.27 | 100164.18 |
INSERT INTO TABLE wikiTotalArticleLength SELECT SUM(length) as total_article_chars FROM wiki | 221.74 | 75554.76 | 166.92 | 100373.80 |
INSERT INTO TABLE wikiTotalArticlePages SELECT COUNT(*) as total_pages FROM wiki | 221.80 | 75536.05 | 166.14 | 100842.18 |
INSERT INTO TABLE wikiContributorSummary SELECT contributor_username, COUNT(*) as page_count FROM wiki GROUP BY contributor_username | 236.11 | 70958.52 | 181.42 | 92350.26 |
Single Node Local Clustered Setup Statistics
A fully distributed setup was tested locally with multiple JVMs, and with the following hardware infrastructure specifications.
...
It was observed that the performance here is comparatively higher (taking into account that the setup consists of a single machine). This is mainly due to the DAS server and MySQL existing locally, and having no physical network I/O delays as a result. This allows the queries to be executed in an optimal manner.
Indexing Performance
In the following table, the shardIndexRecordBatchSize
indicates the amount of index data (in bytes) to be processed at a time by a shard index worker.
...