Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents
maxLevel3

Introduction

This sample demonstrates an analysis of data that are collected on the usage of the Wikipedia.

Anchor
Prerequisites
Prerequisites
Prerequisites

Follow the steps below to set up the prerequisites before you start.

  1. Set up the  general prerequisites required for WSO2 DAS.
  2. Download a Wikipedia data dump, and extract the compressed XML articles dump file to a preferred location of your machine.

Building the sample

Follow the steps below to build the sample.

Anchor
Uploading the Carbon Application
Uploading the Carbon Application
Uploading the Carbon Application

Follow the steps below to upload the Carbon Application (c-App) file of this sample. For more information, see Carbon Application Deployment for DAS.

  1. Log in to the DAS management console using the following URL: https://<DAS_HOST>:<DAS_PORT>/carbon/
  2. Click Main, and then click Add in the Carbon Applications menu.
  3. Click  Choose File, and upload the <DAS_HOME>/samples/capps/Wiki[pedia.car file as shown below.
  4. Click  Main , then click Carbon Applications, and then click List view, to see the uploaded Carbon application as shown below.  

Executing the sample

Follow the steps below to execute the sample.

Tuning the server configurations

The wikipedia dataset is transferred as a single article in a single event. Therefore, an event is relatively large (~300KB). Hence, you need to tune the server configurations as follows (i.e. specially the queue sizes available in the <DAS_HOME>/repository/conf/data-bridge/data-bridge-config.xml  file for data receiving, and <DAS_HOME>/repository/conf/analytics/analytics-eventsink-config.xml  file for data persistence, and also change the Thrift publisher related configurations in the <DAS_HOME>/repository/conf/data-bridge/data-bridge-config.xml  file).

  1. Edit the values of the following properties in the <DAS_HOME>/repository/conf/data-bridge/data-bridge-config.xml file as shown below, to tune the queue sizes available for data receiving.

    Code Block
    languagexml
    <dataBridgeConfiguration>
    	<maxEventBufferCapacity>5000</maxEventBufferCapacity>
    	<eventBufferSize>2000</eventBufferSize>
    </dataBridgeConfiguration>
  2. Edit the values of the following properties in the  <DAS_HOME>/repository/conf/data-bridge/data-agent-config.xml  file as shown below.to tune the queue sizes available for data persistence.

    Code Block
    languagexml
    <DataAgentsConfiguration>
    	 <Agent>
            <Name>Thrift</Name>
    			<QueueSize>65536</QueueSize>
           	 	<BatchSize>500</BatchSize>
    	</Agent>
    </DataAgentsConfiguration>
  3. Edit the values of the following properties in the  <DAS_HOME>/repository/conf/analytics/analytics-eventsink-config.xml  file as shown below, to change the Thrift publisher related configurations .

    Code Block
    languagexml
    <AnalyticsEventSinkConfiguration>
    	<QueueSize>65536</QueueSize>
    	<maxQueueCapacity>1000</maxQueueCapacity>
    	<maxBatchSize>1000</maxBatchSize>
    </AnalyticsEventSinkConfiguration
Running the data publisher

Navigate to <DAS_HOME>/samples/wikipedia/ directory in a new CLI tab, and execute the following command to run the data publisher: ant -Dpath=/home/laf/Downloads/enwiki-20150805-pages-articles.xml -Dcount=1000

 

Info

Set the values of the -Dpath and -Dcount Java system properties in the above command, to point them to the location where you stored the Wikipedia article XML dump file which you downloaded in

...

Prerequisites, and to the number of articles you need to publishe as events out of the total dataset respectively. (E.g. -Dcount=-1 to publish all articles.) This sends events to the event stream which is deployed through the  above C-App.

Executing the scripts

Follow the steps below to execute the Spark scripts which are deployed by the sample C-App.

  1. Log in to the DAS management console using the following URL: https://<DAS_HOST>:<DAS_PORT>/carbon/
  2. Click Main, and then click Scripts in the Batch Analytics menu.
  3. Click the corresponding Execute option of each of the following scripts to execute them.
    scripts to be executedImage Added

Viewing the output

You may use the Data Explorer or the Analytics Dashboard of the WSO2 DAS Management Console to browse published sample events.

Anchor
Using the Data Explorer
Using the Data Explorer
Using the Data Explorer 

Follow the steps below to use the Data Explorer to view the output. 

Using the Data Explorer 

Follow the steps below to use the Data Explorer to view the output. 

  1. Log in to the DAS management console if you are not already logged in.
  2. Click Main, and then click Data Explorer in the Interactive Analytics menu.
  3. Select ORG_WSO2_DAS_SAMPLE_WIKIPEDIA_DATA for the Table Name as shown below.

    select the event streamImage Modified

 

  1. Tip

    You can also select the other streams which are deployed by the sample C-App as shown below.

    Image Modified

  2. Click  Search. You view the published data as shown below.
    Image Modified

Using the Analytics Dashboard

Follow the steps below to use the Analytics Dashboard to view the output. 

  1. Log in to the DAS management console if you are not already logged in.
  2. Click Main, and then click Analytics Dashboard in the Dashboard menu.
  3. Log in to the Analytics Dashboard using admin/admin credentials.
  4. Click the following CREATE DASHBOARD button in the top navigational bar to create a new dashboard.

  5.  Enter a  Title  and a  Description  for the new dashboard as shown below, and click  Next as shown below.
    create a new DasboardImage Modified
  6. Select a layout to place its components as shown below.

    select a layout

  7. Click Select button of the Single Comun layout. You view a layout editor with the chosen layout blocks marked using dashed lines.
  8. Click the following CREATE GADGET button in the top menu bar. 
  9. Select the input data source as shown below, and click  Next .

    select the data sourceImage Modified
  10. Select  Chart Type  and enter the preferred x, y axis and additional parameters based on the selected chart type as shown below, and click Preview.
    create a new gadgetImage Modified
  11. Click  Add to Gadget Store.
  12. Click the corresponding  Design  button of the Wikepedia_Samples_Dashboard to add the Contributor Summary gadget as shown below.
    designing the DashboardImage Modified
  13.  Click the following gadget browser icon in the side menu bar.
     

     

    You view the new gadget listed in the gadget browser as shown below.

    new gadget in the list of all available gadgetsImage Modified
  14. Click on  the new gadget, drag it out, and place it in the preferred grid of the selected layout in the dashboard editor as shown below.

    add gadget to layoutImage Removedadd gadget to layoutImage Added

  15. Click the following  PREVIEW  button in the top menu bar. 

    previewing the Dashboard
    You view the preview of the Wikepedia_Samples_Dashboard  with the Contributor Summary gadget added to it as shown below.
    Image Removed previewing the DashboardImage Added