Documentation

matlab.compiler.mlspark.SparkConf class

Package: matlab.compiler.mlspark
Superclasses:

Interface class to configure an application with Spark parameters as key-value pairs

Description

A SparkConf object stores the configuration parameters of the application being deployed to Spark™. Every application must be configured prior to deployment on a Spark cluster. The configuration parameters are passed onto a Spark cluster through a SparkContext.

Construction

conf = matlab.compiler.mlspark.SparkConf('AppName',name,'Master',url,'SparkProperties',prop) creates a SparkConf object with the specified configuration parameters.

conf = matlab.compiler.mlspark.SparkConf(___,Name,Value) creates a SparkConf object with additional configuration parameters specified by one or more Name,Value pair arguments. Name is a property name of the class and Value is the corresponding value. Name must appear inside single quotes (''). You can specify several name-value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Input Arguments

expand all

Name of application specified as a character vector inside single quotes ('').

Example: 'AppName', 'myApp'

Data Types: char | string

Name of the master URL specified as a character vector inside single quotes ('').

URLDescription
local

Run Spark locally with one worker thread. There is no parallelism by selecting this option.

local[K]

Run Spark locally with K worker threads. Set K to the number of cores on your machine.

local[*]

Run Spark locally with as many worker threads as logical cores on your machine.

yarn-clientConnect to a Hadoop® YARN cluster in client mode. The cluster location is found based on the HADOOP_CONF_DIR or YARN_CONF_DIR variable.

Example: 'Master', 'yarn-client'

Data Types: char | string

A containers.Map object containing Spark configuration properties as key-value pairs.

Note

When deploying to a local cluster using the MATLAB API for Spark, the 'SparkProperties' property name can be ignored during the construction of a SparkConf object, thereby requiring no value for prop. Or you can set prop to an empty containers.Map object as follows:

'SparkProperties',containers.Map({''},{''})
The key and value of the containers.Map object are empty char vectors.

When deploying to a Hadoop YARN cluster, set the value for prop with the appropriate Spark configuration properties as key-value pairs. The precise set of Spark configuration properties vary from one deployment scenario to another, based on the deployment cluster environment. Users must verify the Spark setup with a system administrator to use the appropriate configuration properties. See the table for commonly used Spark properties. For a full set of properties, see the latest Spark documentation.

Running Spark on YARN

Property Name (Key)Default (Value)Description
spark.executor.cores1

The number of cores to use on each executor.

For YARN and Spark standalone mode only. In Spark standalone mode, setting this parameter allows an application to run multiple executors on the same worker, provided that there are enough cores on that worker. Otherwise, only one executor per application runs on each worker.

spark.executor.instances2

The number of executors.

Note

This property is incompatible with spark.dynamicAllocation.enabled. If both spark.dynamicAllocation.enabled and spark.executor.instances are specified, dynamic allocation is turned off and the specified number of spark.executor.instances is used.

spark.driver.memory

  • 1g

  • 2048m (recommended)

Amount of memory to use for the driver process.

If you get any out of memory errors while using tall/gather, consider increasing this value.

spark.executor.memory

  • 1g

  • 2048m (recommended)

Amount of memory to use per executor process.

If you get any out of memory errors while using tall/gather, consider increasing this value.

spark.yarn.executor.memoryOverhead

  • executorMemory * 0.10, with minimum of 384.

  • 4096m (recommended)

The amount of off-heap memory (in MBs) to be allocated per executor.

If you get any out of memory errors while using tall/gather, consider increasing this value.

spark.dynamicAllocation.enabledfalse

This option integrates Spark with the YARN resource management. Spark initiates as many executors as possible given the executor memory requirement and number of cores. This property requires that the cluster be set up.

Setting this property to true specifies whether to use dynamic resource allocation, which scales the number of executors registered with this application up and down based on the workload.

This property requires spark.shuffle.service.enabled to be set. The following configurations are also relevant: spark.dynamicAllocation.minExecutors, spark.dynamicAllocation.maxExecutors, and spark.dynamicAllocation.initialExecutors

spark.shuffle.service.enabledfalse

Enables the external shuffle service. This service preserves the shuffle files written by executors so the executors can be safely removed. This must be enabled if spark.dynamicAllocation.enabled is set to true. The external shuffle service must be set up in order to enable it.

MATLAB Specific Properties

Property Name (Key)Default (Value)Description
spark.matlab.worker.debugfalseFor use in standalone/interactive mode only. If set to true, a Spark deployable MATLAB application executed within the MATLAB desktop environment, starts another MATLAB session as worker, and will enter the debugger. Logging information is directed to log_<nbr>.txt.
spark.matlab.worker.reusetrueWhen set to true, a Spark executor pools workers and reuses them from one stage to the next. Workers terminate when the executor under which the workers are running terminates.
spark.matlab.worker.profilefalseOnly valid when using a session of MATLAB as a worker. When set to true, it turns on the MATLAB Profiler and generates a Profile report that is saved to the file profworker_<split_index>_<socket>_<worker pass>.mat.
spark.matlab.worker.numberOfKeys10000Number of unique keys that can be held in a containers.Map object while performing *ByKey operations before map data is spilled to a file.
spark.matlab.executor.timeout600000

Spark executor timeout in milliseconds. Not applicable when deploying tall arrays.

Monitoring and Logging

Property Name (Key)Default (Value)Description
spark.history.fs.logDirectoryfile:/tmp/spark-events

Directory that contains application event logs to be loaded by the history server.

spark.eventLog.dirfile:///tmp/spark-events

Base directory in which Spark events are logged, if spark.eventLog.enabled is true. Within this base directory, Spark creates a sub directory for each application, and logs the events specific to the application in this directory. You can set this to a unified location like an HDFS™ directory so history files can be read by the history server.

spark.eventLog.enabledfalse

Whether to log Spark events. This is useful for reconstructing the web UI after the application has finished.

Data Types: char

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Map of key-value pairs specified as a containers.Map object.

Example: 'ExecutorEnv', containers.Map({'SPARK_JAVA_OPTS'}, {'-Djava.library.path=/my/custom/path'})

A character vector specifying the path to MATLAB Runtime within single quotes ''.

Example: 'MCRRoot', '/share/MATLAB/MATLAB_Runtime/v91'

Data Types: char | string

Properties

The properties of this class are hidden.

Methods

There are no user executable methods for this class.

Examples

collapse all

The SparkConf class allows you to configure an application with Spark parameters as key-value pairs.

sparkProp = containers.Map({'spark.executor.cores'}, {'1'});
conf = matlab.compiler.mlspark.SparkConf('AppName','myApp', ...
                        'Master','local[1]','SparkProperties',sparkProp);

More About

expand all

References

See the latest Spark documentation for more information.

Introduced in R2016b