This section covers the configurations required to use Apache Spark with WSO2 DAS.
...
If you want to add additional jars, you can add them to the SPARK_CLASSPATH
in the <DAS_HOME>/bin/external-spark-classpath.conf
file in a UNIX environment.
...
Following are the Carbon related configurations that are used for Apache Spark. These configurations are shipped with the product by default in the <DAS_home>/repository/conf/analytics/spark/spark-defaults.conf
file.
Property | Default Value | Description | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
carbon.spark.master | local | The Spark master has three possible states as follows:
| ||||||||
carbon.spark.master.count | 1 | The maximum number of masters allowed at a given time when DAS creates its own Spark cluster.
| ||||||||
| This links to your DAS home by default. | The symbolic link for the jar files in the Spark class path. In a clustered DAS deployment, the directory path for the Spark Class path is different for each node depending on the location of the
|
Spark related configurations
...
Application configurations
Property | Default Value |
---|---|
| CarbonAnalytics |
spark.driver.cores | 1 |
spark.driver.memory | 512m |
spark.executor.memory | 512m |
Spark UI configurations
Property | Default Value |
---|---|
| CarbonAnalytics |
spark.history.ui.port | 18080 |
Compression and serialization configurations
Property | Default Value |
---|---|
| org.apache.spark.serializer.KryoSerializer |
spark.kryoserializer.buffer | 256k |
spark.kryoserializer.buffer.max | 256m |
Networking configurations
Property | Default Value |
---|---|
| 12000 |
spark.broadcast.port | 12500 |
spark.driver.port | 13000 |
spark.executor.port | 13500 |
spark.fileserver.port | 14000 |
spark.replClassServer.port | 14500 |
Scheduling configurations
Property | Default Value |
---|---|
|
|
Info | |||||
---|---|---|---|---|---|
In addition to having
|
Standalone cluster configurations
Property | Default Value |
---|---|
|
|
spark.deploy.recoveryMode.factory | org.wso2.carbon.analytics.spark.core.deploy.AnalyticsRecoveryModeFactory |
Master configurations
Property | Default Value |
---|---|
| 7077 |
spark.master.rest.port | 6066 |
spark.master.webui.port | 8081 |
Worker configurations
Property | Default Value |
---|---|
| 1 |
spark.worker.memory | 1g |
spark.worker.dir | work |
spark.worker.port | 11000 |
spark.worker.webui.port | 11500 |
Executor configurations
Anchor | ||||
---|---|---|---|---|
|
It is recommended to run only one executor per DAS worker. If you observe any memory or Spark executor time issues for this executor, you can increase the amount of memory and the number of CPU cores allocated to it.
Property | Default Value | Description |
---|---|---|
spark.executor.cores | 1 | The number of cores allocated to the Spark executors that are running in the DAS node. All the availble CPU cores of the worker are allocated to the executor(s) by default. |
spark.executor.memory | 1g | The amount of CPU memory allocated to the spark executor(s). |
Optional Spark related configurations
...
Code Block | ||
---|---|---|
| ||
spark.executor.logs.rolling.strategy size spark.executor.logs.rolling.maxSize 10000000 spark.executor.logs.rolling.maxRetainedFiles 10 |
Property | Description |
---|---|
spark.executor.logs.rolling.strategy | This indicates the strategy used to control the amount of logs saved in the <DAS_HOME>/work directory. In the above configuration, property value size indicates that the amount of logs that are allowed to be kept is restricted based on the size. |
spark.executor.logs.rolling.maxSize | The maximum size (in bytes) allowed for logs saved in the <DAS_HOME>/work directory at any given time. Older log files are deleted when new logs are generated so that the specified maximum size is not exceeded. |
spark.executor.logs.rolling.maxRetainedFiles | The maximum number of log files allowed to be kept in the <DAS_HOME>/work directory at any given time. Older log files are deleted when new logs are generated so that the specified maximum number of files is not exceeded. In the above configuration, this property is overruled by the spark.executor.logs.rolling.maxSize property because the value specified for the spark.executor.logs.rolling.strategy is size . If the maximum size specified for logs is reached, older logs are deleted even if the maximum number of files specified is not yet reached. |