Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents
maxLevel2
minLevel2

Enabling worker logs

To enable worker logs, open <DAS_HOME>/repository/conf/analytics/spark/spark-defaults.conf file and update the relevant parameters as explained in the table below.

ParameterValue
spark.eventLog.enabledThis should be set to true.
spark.eventLog.dirSet <DAS_HOME>/work as the location to save the logs.

Monitoring via Spark UIs

Note

Before you enable Spark UIs, note

Page under construction.that they are only supported with HTTP, and the Environment tab of Application UI may display security sensitive information such as the keystore password depending on your Spark Configuration.

If you do not want to expose such information, the following options are available:

  • Exclude the relevant Spark properties from being displayed in the Environment tab by editing your spark properties.
  • Disable the Spark UIs for your DAS deployment by setting the spark.ui.enabled property to false in the <DAS_HOME>/repository/conf/analytics/spark/spark-defaults.conf file as shown below.

    Code Block
    spark.ui.enabled false

Apache Spark provides a set of user interfaces (UI) that allow you to monitor and troubleshoot the issues in a Spark cluster. This section helps you to understand the information accessed from these UIs.

...

Info

These ports are only available when WSO2 DAS is deployed as a Spark cluster. For detailed instructions to set up a cluster, see WSO2 Clustering Guide - Minimum High Availability Deployment for WSO2 DAS.

UIDefault port
Master UI**8081
Worker UI**
1150
11500
Application UI4040

Master UI

This is the master web UI for Spark.

Image Added

This provides the following information in the header.

ItemDescription
URLThe URL of the Spark Master.
REST URLThe URL ofr the Spark REST endpoint.
Alive WorkersThe number of active worker nodes in the Spark cluster.
Cores in useThis displays the number of CPU cores allocated to the Spark cluster, and the number of cores that are used.
Memory in useThis displays the amount of memory allocated to the Spark cluster, and the amount of memory in use.
ApplicationsThe number of Spark applications that are currently running within the Spark cluster, and the number of applications of which the execution is already complete.
DriversThe number of Spark drivers that are running within the Spark cluster,and the number of drivers of which the execution is already complete.
Status

The current status of the Spark Cluster. Possible values are as follows:

  • ALIVE: This is the status of the currently active Spark node in a high availability cluster.
  • STANDBY: This is the status of the standby node of a Spark cluster

In addition, the following tables are displayed.

  • Workers
    This table displays the following information for each worker.

    ColumnDescription
    Worker IDThe ID of the worker. You can click on the relevant worker ID to access the web UI for a worker.
    AddressThe host and the port on which the worker is run.
    StateThe current status of the worker.
    CoresThe number of CPU cores allocated for the worker, and the number of cores used by the worker.
    MemoryThe amount of memory allocated to the worker, and the amount of memory used by the worker.
  • Running Applications
    This table displays the following information for each running application.

    ColumnDescription
    Application ID

    The ID of the application. If you click on an application ID, you can view the basic information displayed in this table as well as the executor summary for the application.

    NameThe name of the application. You can click on an application name to open its web UI.
    CoresThe number of CPU cores used by the application.
    Memory per NodeThe amount of CPU memory used by the application.
    Submitted TimeThe time at which the application was submitted for execution.
    UserThe ID of the user who submitted the application to be executed.
    StateThe current status of the application.
    DurationThe time duration for which the application has been running since it was submitted for execution.


     
  • Completed Applications
    This table displays the following information for each completed application. 

    ColumnDescription
    Application IDThe ID of the application. You can click on an application ID to open its web UI.
    NameThe name of the application.
    CoresThe number of CPU cores used by the application.
    Memory per NodeThe amount of CPU memory used by the application.
    Submitted TimeThe time at which the application was submitted for execution.
    UserThe ID of the user who submitted the application to be executed.
    StateThe current status of the application.
    DurationThe time duration pent to execute the application.

Worker UI

This is the web UI for an individual Spark worker.

Image Added

The following information is displayed for each Spark worker in its web UI under Running Executors.

Info

It is recommended to run only one executor per DAS worker. If you observe any memory or Spark execution time issues for that executor, you can increase the amount of memory and the number of CPU cores allocated for that executor. For more information about configuring Spark executors, see Spark Configurations - Executor configurations.

ColumnDescription
ExecutorIDThe ID of the executor to which the information applies.
CoresThe number of cores used by the executor.
StateThe current status of the executor.
MemoryThe memory used by the executor.
Job Details

This displays the following:

  • The ID of the job performed by the executor.
  • The name of the job.
  • The ID of the user who submitted the job.
LogsThis lists the IDs of logs generated for the Spark worker. To view a specific log, click on the relevant ID.


Application UI

The following web UIs are available for the applications.

  • Application UI: This can be accessed by clicking on an application ID in the Master UI (default port 8081).
    Image Added 
    The information displayed is as shown below.
    Image Added 
  • Application Detail UI: This can be accessed by clicking on an application name in the Master UI (default port 8081).
    Image Added
    This UI contains the following tabs.
    Table of Contents
    maxLevel4
    minLevel4

Jobs

This tab provides an overview of the Spark jobs carried out in the current DAS session.

Image Added

Stages

This tab shows the current state of all the stages of all the jobs in a Spark application. 

Image Added


Environment

This tab displays detailed information about the environment of the selected Spark application.

Image Added

Executors

This tab displays detailed information about the executors of the selected Spark application.

Image Added

SQL

This tab displays detailed information about the SQL queries of the selected Spark application.

Image Added


Spark issues in a production envirionment

The following are three issues that may occur when you work with Spark in a multi node DAS cluster:

Info

The following issues only occur when the DAS cluster is running in RedHat Linux environments.

  • The DAS nodes consuming too much CPU processing power.
  • DAS nodes running out of memory.
  • Too many log directories being created in the  <DAS_HOME>/work  directory.

All of the above issues can be created as a result of he symbolic link not being correctly resolved in the operating system. To address this, you are required to update the  <DAS_HOME>/bin/wso2server.sh  file with the following entry so that the  <DAS_HOME>  is exported.  

Export CARBON_HOME=<symbolic link