Unknown macro: {next_previous_links}
Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

Introduction

This sample demonstrates an analysis of data that are collected on the usage of the Wikipedia.

Prerequisites

Follow the steps below to set up the prerequisites before you start.

  1. Set up the general prerequisites required for WSO2 DAS.
  2. Download a Wikipedia data dump, and extract the compressed XML articles dump file to a preferred location of your machine.

Building the sample

Follow the steps below to build the sample.

Uploading the Carbon Application

Follow the steps below to upload the Carbon Application (c-App) file of this sample. For more information, see Carbon Application Deployment for DAS.

  1. Log in to the DAS management console using the following URL: https://<DAS_HOST>:<DAS_PORT>/carbon/
  2. Click Main, and then click Add in the Carbon Applications menu.
  3. Click Choose File, and upload the <DAS_HOME>/samples/capps/Wiki[pedia.car file as shown below.
  4. Click Main, then click Carbon Applications, and then click List view, to see the uploaded Carbon application as shown below. 

Executing the sample

Follow the steps below to execute the sample.

Tuning the server configurations

The wikipedia dataset is transferred as a single article in a single event. Therefore, an event is relatively large (~300KB). Hence, you need to tune the server configurations as follows (i.e. specially the queue sizes available in the <DAS_HOME>/repository/conf/data-bridge/data-bridge-config.xml file for data receiving, and <DAS_HOME>/repository/conf/analytics/analytics-eventsink-config.xml file for data persistence, and also change the Thrift publisher related configurations in the <DAS_HOME>/repository/conf/data-bridge/data-bridge-config.xml file).

  1. Edit the values of the following properties in the <DAS_HOME>/repository/conf/data-bridge/data-bridge-config.xml file as shown below.

    <dataBridgeConfiguration>
    	<maxEventBufferCapacity>5000</maxEventBufferCapacity>
    	<eventBufferSize>2000</eventBufferSize>
    </dataBridgeConfiguration>
  2. Edit the values of the following properties in the <DAS_HOME>/repository/conf/data-bridge/data-agent-config.xml file as shown below.

    <DataAgentsConfiguration>
    	 <Agent>
            <Name>Thrift</Name>
    			<QueueSize>65536</QueueSize>
           	 	<BatchSize>500</BatchSize>
    	</Agent>
    </DataAgentsConfiguration>
  3. Edit the values of the following properties in the <DAS_HOME>/repository/conf/analytics/analytics-eventsink-config.xml file as shown below.

    <AnalyticsEventSinkConfiguration>
    	<QueueSize>65536</QueueSize>
    	<maxQueueCapacity>1000</maxQueueCapacity>
    	<maxBatchSize>1000</maxBatchSize>
    </AnalyticsEventSinkConfiguration>
Running the data publisher

Navigate to <DAS_HOME>/samples/wikipedia/ directory in a new CLI tab, and execute the following command to run the data publisher: ant -Dpath=/home/laf/Downloads/enwiki-20150805-pages-articles.xml -Dcount=1000

Set the values of the -Dpath and -Dcount Java system properties in the above command, to point them to the location where you stored the Wikipedia article XML dump file which you downloaded in Analysing Wikepedia Data, and to the number of
articlesyou need to publishe as events out of the total dataset respectively. (E.g. -Dcount=-1 to publish all articles.)

This sends each log line as an event to the event stream which is deployed through the above C-App.


  • No labels