Table of Contents | ||||
---|---|---|---|---|
|
Enabling worker logs
To enable worker logs, open <DAS_HOME>/repository/conf/analytics/spark/spark-defaults.conf
file and update the relevant parameters as explained in the table below.
Parameter | Value |
---|---|
spark.eventLog.enabled | This should be set to true . |
spark.eventLog.dir | Set <DAS_HOME>/work as the location to save the logs. |
Monitoring via Spark UIs
Note | ||
---|---|---|
Before you enable Spark UIs, note | ||
Page under construction.that they are only supported with HTTP, and the Environment tab of Application UI may display security sensitive information such as the keystore password depending on your Spark Configuration. If you do not want to expose such information, the following options are available:
|
Apache Spark provides a set of user interfaces (UI) that allow you to monitor and troubleshoot the issues in a Spark cluster. This section helps you to understand the information accessed from these UIs.
...
Info |
---|
These ports are only available when WSO2 DAS is deployed as a Spark cluster. For detailed instructions to set up a cluster, see WSO2 Clustering Guide - Minimum High Availability Deployment for WSO2 DAS. |
UI | Default port |
---|---|
Master UI** | 8081 |
Worker UI** |
11500 | |
Application UI | 4040 |
Master UI
This is the master web UI for Spark.
This provides the following information in the header.
Item | Description |
---|---|
URL | The URL of the Spark Master. |
REST URL | The URL ofr the Spark REST endpoint. |
Alive Workers | The number of active worker nodes in the Spark cluster. |
Cores in use | This displays the number of CPU cores allocated to the Spark cluster, and the number of cores that are used. |
Memory in use | This displays the amount of memory allocated to the Spark cluster, and the amount of memory in use. |
Applications | The number of Spark applications that are currently running within the Spark cluster, and the number of applications of which the execution is already complete. |
Drivers | The number of Spark drivers that are running within the Spark cluster,and the number of drivers of which the execution is already complete. |
Status | The current status of the Spark Cluster. Possible values are as follows:
|
In addition, the following tables are displayed.
Workers
This table displays the following information for each worker.Column Description Worker ID The ID of the worker. You can click on the relevant worker ID to access the web UI for a worker. Address The host and the port on which the worker is run. State The current status of the worker. Cores The number of CPU cores allocated for the worker, and the number of cores used by the worker. Memory The amount of memory allocated to the worker, and the amount of memory used by the worker. Running Applications
This table displays the following information for each running application.Column Description Application ID The ID of the application. If you click on an application ID, you can view the basic information displayed in this table as well as the executor summary for the application.
Name The name of the application. You can click on an application name to open its web UI. Cores The number of CPU cores used by the application. Memory per Node The amount of CPU memory used by the application. Submitted Time The time at which the application was submitted for execution. User The ID of the user who submitted the application to be executed. State The current status of the application. Duration The time duration for which the application has been running since it was submitted for execution. Completed Applications
This table displays the following information for each completed application.Column Description Application ID The ID of the application. You can click on an application ID to open its web UI. Name The name of the application. Cores The number of CPU cores used by the application. Memory per Node The amount of CPU memory used by the application. Submitted Time The time at which the application was submitted for execution. User The ID of the user who submitted the application to be executed. State The current status of the application. Duration The time duration pent to execute the application.
Worker UI
This is the web UI for an individual Spark worker.
The following information is displayed for each Spark worker in its web UI under Running Executors.
Info |
---|
It is recommended to run only one executor per DAS worker. If you observe any memory or Spark execution time issues for that executor, you can increase the amount of memory and the number of CPU cores allocated for that executor. For more information about configuring Spark executors, see Spark Configurations - Executor configurations. |
Column | Description |
---|---|
ExecutorID | The ID of the executor to which the information applies. |
Cores | The number of cores used by the executor. |
State | The current status of the executor. |
Memory | The memory used by the executor. |
Job Details | This displays the following:
|
Logs | This lists the IDs of logs generated for the Spark worker. To view a specific log, click on the relevant ID. |
Application UI
The following web UIs are available for the applications.
- Application UI: This can be accessed by clicking on an application ID in the Master UI (default port 8081).
The information displayed is as shown below.
- Application Detail UI: This can be accessed by clicking on an application name in the Master UI (default port 8081).
This UI contains the following tabs.Table of Contents maxLevel 4 minLevel 4
Jobs
This tab provides an overview of the Spark jobs carried out in the current DAS session.
Stages
This tab shows the current state of all the stages of all the jobs in a Spark application.
Environment
This tab displays detailed information about the environment of the selected Spark application.
Executors
This tab displays detailed information about the executors of the selected Spark application.
SQL
This tab displays detailed information about the SQL queries of the selected Spark application.
Spark issues in a production envirionment
The following are three issues that may occur when you work with Spark in a multi node DAS cluster:
Info |
---|
The following issues only occur when the DAS cluster is running in RedHat Linux environments. |
- The DAS nodes consuming too much CPU processing power.
- DAS nodes running out of memory.
- Too many log directories being created in the
<DAS_HOME>/work
directory.
All of the above issues can be created as a result of he symbolic link not being correctly resolved in the operating system. To address this, you are required to update the <DAS_HOME>/bin/wso2server.sh
file with the following entry so that the <DAS_HOME>
is exported. Export CARBON_HOME=<symbolic link