Unknown macro: {next_previous_links}
Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Next »

Introduction

This sample demonstrates an analysis of data that are collected on the usage of the Wikipedia.

Prerequisites

Follow the steps below to set up the prerequisites before you start.

  1. Set up the general prerequisites required for WSO2 DAS.
  2. Download a Wikipedia data dump, and extract the compressed XML articles dump file to a preferred location of your machine.

Building the sample

Follow the steps below to build the sample.

Uploading the Carbon Application

Follow the steps below to upload the Carbon Application (c-App) file of this sample. For more information, see Carbon Application Deployment for DAS.

  1. Log in to the DAS management console using the following URL: https://<DAS_HOST>:<DAS_PORT>/carbon/
  2. Click Main, and then click Add in the Carbon Applications menu.
  3. Click Choose File, and upload the <DAS_HOME>/samples/capps/Wiki[pedia.car file as shown below.
  4. Click Main, then click Carbon Applications, and then click List view, to see the uploaded Carbon application as shown below. 

Executing the sample

Follow the steps below to execute the sample.

Tuning the server configurations

The wikipedia dataset is transferred as a single article in a single event. Therefore, an event is relatively large (~300KB). Hence, you need to tune the server configurations as follows (i.e. specially the queue sizes available in the <DAS_HOME>/repository/conf/data-bridge/data-bridge-config.xml file for data receiving, and <DAS_HOME>/repository/conf/analytics/analytics-eventsink-config.xml file for data persistence, and also change the Thrift publisher related configurations in the <DAS_HOME>/repository/conf/data-bridge/data-bridge-config.xml file).

  1. Edit the values of the following properties in the <DAS_HOME>/repository/conf/data-bridge/data-bridge-config.xml file as shown below.

    <dataBridgeConfiguration>
    	<maxEventBufferCapacity>5000</maxEventBufferCapacity>
    	<eventBufferSize>2000</eventBufferSize>
    </dataBridgeConfiguration>
  2. Edit the values of the following properties in the <DAS_HOME>/repository/conf/data-bridge/data-agent-config.xml file as shown below.

    <DataAgentsConfiguration>
    	 <Agent>
            <Name>Thrift</Name>
    			<QueueSize>65536</QueueSize>
           	 	<BatchSize>500</BatchSize>
    	</Agent>
    </DataAgentsConfiguration>
  3. Edit the values of the following properties in the <DAS_HOME>/repository/conf/analytics/analytics-eventsink-config.xml file as shown below.

    <AnalyticsEventSinkConfiguration>
    	<QueueSize>65536</QueueSize>
    	<maxQueueCapacity>1000</maxQueueCapacity>
    	<maxBatchSize>1000</maxBatchSize>
    </AnalyticsEventSinkConfiguration
Running the data publisher

Navigate to <DAS_HOME>/samples/wikipedia/ directory in a new CLI tab, and execute the following command to run the data publisher: ant -Dpath=/home/laf/Downloads/enwiki-20150805-pages-articles.xml -Dcount=1000

 

Set the values of the -Dpath and -Dcount Java system properties in the above command, to point them to the location where you stored the Wikipedia article XML dump file which you downloaded in Analysing Wikepedia Data, and to the number of articles you need to publishe as events out of the total dataset respectively. (E.g. -Dcount=-1 to publish all articles.) This sends events to the event stream which is deployed through the above C-App.

Viewing the output

You may use the Data Explorer or the Analytics Dashboard of the WSO2 DAS Management Console to browse published sample events.

Using the Data Explorer 

Follow the steps below to use the Data Explorer to view the output. 

Using the Data Explorer 

Follow the steps below to use the Data Explorer to view the output. 

  1. Log in to the DAS management console if you are not already logged in.
  2. Click Main, and then click Data Explorer in the Interactive Analytics menu.
  3. Select ORG_WSO2_DAS_SAMPLE_WIKIPEDIA_DATA for the Table Name as shown below.

    You can also select the other streams which are deployed by the sample C-App as shown below.

     

     


    select the event stream

  4. Click Search. You view the published data as shown below.

     


  • No labels