Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.

This section summarizes the results of performance tests carried out with the minimum fully distributed DAS deployment setup with RDBMS (MySQL) and HBase event stores separately.

DAS Performance Test Round 1: RDBMS

Infrastructure used

  • c4.2xlarge Amazon EC2 instances as the DAS nodes
  • One DAS node as the publisher
  • A c3.2xlarge Amazon instance as the database node


Data setEvent CountQuery TypeTime Taken (seconds)
Smart Home10000000INSERT OVERWRITE TABLE cityUsage SELECT metro_area, avg(power_reading) AS avg_usage, min(power_reading) AS min_usage, max(power_reading) AS max_usage FROM smartHomeData GROUP BY metro_area 26 sec
Smart Home10000000INSERT OVERWRITE TABLE peakDeviceUsageRange SELECT house_id, (max(power_reading) - min(power_reading)) AS usage_range FROM smartHomeData WHERE is_peak = true AND metro_area = "Seattle" GROUP BY house_id 22 sec
Smart Home10000000INSERT OVERWRITE TABLE stateAvgUsage SELECT state, avg(power_reading) AS state_avg_usage FROM smartHomeData21 sec
Smart Home10000000INSERT OVERWRITE TABLE stateUsageDifference SELECT a2.state, (a2.state_avg_usage-a1.overall_avg) AS avg_usage_difference FROM (select avg(state_avg_usage) as overall_avg from stateAvgUsage) as a1 join stateAvgUsage as a2  1 sec
Wikipedia10000000INSERT INTO TABLE wikiAvgArticleLength SELECT AVG(length) as avg_article_length FROM wiki48 min
Wikipedia10000000INSERT INTO TABLE wikiContributorSummary SELECT contributor_username, COUNT(*) as page_count FROM wiki GROUP BY contributor_username1 hour 45 min
Wikipedia10000000INSERT INTO TABLE wikiTotalArticleLength SELECT SUM(length) as total_article_chars FROM wiki44 min
Wikipedia10000000INSERT INTO TABLE wikiTotalArticlePages SELECT COUNT(*) as total_pages FROM wiki1 hour 17 min


DAS Performance Test Round 2: HBase Cluster

This test involved setting up a 10-node HBase cluster with HDFS as the undrelying file system.


Query2 Analyzers3 Analyzers
Time(s)Mean TPSTime(s)Mean TPS
INSERT INTO TABLE wikiAvgArticleLength SELECT AVG(length) as avg_article_length FROM wiki222.7075234.03167.27100164.18
INSERT INTO TABLE wikiTotalArticleLength SELECT SUM(length) as total_article_chars FROM wiki221.7475554.76166.92100373.80
INSERT INTO TABLE wikiTotalArticlePages SELECT COUNT(*) as total_pages FROM wiki221.8075536.05166.14100842.18
INSERT INTO TABLE wikiContributorSummary SELECT contributor_username, COUNT(*) as page_count FROM wiki GROUP BY contributor_username236.1170958.52181.4292350.26


Single Node Local Clustered Setup Statistics

A fully distributed setup was tested locally with multiple JVMs, and with the following hardware infrastructure specifications.


It was observed that the performance here is comparatively higher (taking into account that the setup consists of a single machine). This is mainly due to the DAS server and MySQL existing locally, and having no physical network I/O delays as a result. This allows the queries to be executed in an optimal manner.

Indexing Performance

 In the following table, the shardIndexRecordBatchSize indicates the amount of index data (in bytes) to be processed at a time by a shard index worker. 
