Spark Troubleshooting

Enabling worker logs

To enable worker logs, open <DAS_HOME>/repository/conf/analytics/spark/spark-defaults.conf file and update the relevant parameters as explained in the table below.

Parameter	Value
`spark.eventLog.enabled`	This should be set to `true`.
`spark.eventLog.dir`	Set `<DAS_HOME>/work` as the location to save the logs.

Monitoring via Spark UIs

Before you enable Spark UIs, note that they are only supported with HTTP, and the Environment tab of Application UI may display security sensitive information such as the keystore password depending on your Spark Configuration.

If you do not want to expose such information, the following options are available:

Exclude the relevant Spark properties from being displayed in the Environment tab by editing your spark properties.
Disable the Spark UIs for your DAS deployment by setting the spark.ui.enabled property to false in the <DAS_HOME>/repository/conf/analytics/spark/spark-defaults.conf file as shown below.
```
spark.ui.enabled false
```

Apache Spark provides a set of user interfaces (UI) that allow you to monitor and troubleshoot the issues in a Spark cluster. This section helps you to understand the information accessed from these UIs.

The following are the default ports of the main UIs available for Spark. These ports can be configured in the <DAS_HOME>/repository/conf/analytics/spark/spark-defaults.conf file.

These ports are only available when WSO2 DAS is deployed as a Spark cluster. For detailed instructions to set up a cluster, see WSO2 Clustering Guide - Minimum High Availability Deployment for WSO2 DAS.

UI	Default port
Master UI**	8081
Worker UI**	11500
Application UI	4040

Master UI

This is the master web UI for Spark.

This provides the following information in the header.

Item	Description
URL	The URL of the Spark Master.
REST URL	The URL ofr the Spark REST endpoint.
Alive Workers	The number of active worker nodes in the Spark cluster.
Cores in use	This displays the number of CPU cores allocated to the Spark cluster, and the number of cores that are used.
Memory in use	This displays the amount of memory allocated to the Spark cluster, and the amount of memory in use.
Applications	The number of Spark applications that are currently running within the Spark cluster, and the number of applications of which the execution is already complete.
Drivers	The number of Spark drivers that are running within the Spark cluster,and the number of drivers of which the execution is already complete.
Status	The current status of the Spark Cluster. Possible values are as follows: ALIVE: This is the status of the currently active Spark node in a high availability cluster. STANDBY: This is the status of the standby node of a Spark cluster

In addition, the following tables are displayed.

Workers
This table displays the following information for each worker.

Column	Description
Worker ID	The ID of the worker. You can click on the relevant worker ID to access the web UI for a worker.
Address	The host and the port on which the worker is run.
State	The current status of the worker.
Cores	The number of CPU cores allocated for the worker, and the number of cores used by the worker.
Memory	The amount of memory allocated to the worker, and the amount of memory used by the worker.

Running Applications
This table displays the following information for each running application.

Column	Description
Application ID	The ID of the application. If you click on an application ID, you can view the basic information displayed in this table as well as the executor summary for the application.
Name	The name of the application. You can click on an application name to open its web UI.
Cores	The number of CPU cores used by the application.
Memory per Node	The amount of CPU memory used by the application.
Submitted Time	The time at which the application was submitted for execution.
User	The ID of the user who submitted the application to be executed.
State	The current status of the application.
Duration	The time duration for which the application has been running since it was submitted for execution.

Completed Applications
This table displays the following information for each completed application.

Column	Description
Application ID	The ID of the application. You can click on an application ID to open its web UI.
Name	The name of the application.
Cores	The number of CPU cores used by the application.
Memory per Node	The amount of CPU memory used by the application.
Submitted Time	The time at which the application was submitted for execution.
User	The ID of the user who submitted the application to be executed.
State	The current status of the application.
Duration	The time duration pent to execute the application.

Worker UI

This is the web UI for an individual Spark worker.

The following information is displayed for each Spark worker in its web UI under Running Executors.

It is recommended to run only one executor per DAS worker. If you observe any memory or Spark execution time issues for that executor, you can increase the amount of memory and the number of CPU cores allocated for that executor. For more information about configuring Spark executors, see Spark Configurations - Executor configurations.

Column	Description
ExecutorID	The ID of the executor to which the information applies.
Cores	The number of cores used by the executor.
State	The current status of the executor.
Memory	The memory used by the executor.
Job Details	This displays the following: The ID of the job performed by the executor. The name of the job. The ID of the user who submitted the job.
Logs	This lists the IDs of logs generated for the Spark worker. To view a specific log, click on the relevant ID.

Application UI

The following web UIs are available for the applications.

Application UI: This can be accessed by clicking on an application ID in the Master UI (default port 8081).

The information displayed is as shown below.
Application Detail UI: This can be accessed by clicking on an application name in the Master UI (default port 8081).

This UI contains the following tabs.

Jobs

This tab provides an overview of the Spark jobs carried out in the current DAS session.

Stages

This tab shows the current state of all the stages of all the jobs in a Spark application.

Environment

This tab displays detailed information about the environment of the selected Spark application.

Executors

This tab displays detailed information about the executors of the selected Spark application.

SQL

This tab displays detailed information about the SQL queries of the selected Spark application.

Spark issues in a production envirionment

The following are three issues that may occur when you work with Spark in a multi node DAS cluster:

The following issues only occur when the DAS cluster is running in RedHat Linux environments.

The DAS nodes consuming too much CPU processing power.
DAS nodes running out of memory.
Too many log directories being created in the <DAS_HOME>/work directory.

All of the above issues can be created as a result of he symbolic link not being correctly resolved in the operating system. To address this, you are required to update the <DAS_HOME>/bin/wso2server.sh file with the following entry so that the <DAS_HOME> is exported.

Export CARBON_HOME=<symbolic link