Documentation

Deploy Applications Using the MATLAB API for Spark

Create and execute MATLAB® applications against Spark™ using the MATLAB API for Spark

Supported Platform: Linux® only.

Using the MATLAB API for Spark to deploy an application consists of two parts :

  • Creating your application using the MATLAB API for Spark and packaging it as a standalone application in the MATLAB desktop environment.

  • Executing the standalone application against a Spark enabled cluster from a Linux shell.

While creating your application using the MATLAB API for Spark, you will be able to use Spark functions such as flatMap, mapPartitions, aggregate and others in your MATLAB code. The API exposes the Spark programing model to MATLAB, allowing for MATLAB implementations of numerous Spark functions. Many of these MATLAB implementations accept function handles or anonymous functions as inputs to perform various types of analyses.

The API lets you interactively run your application from within the MATLAB desktop environment in a nondistributed mode on a single machine. A second MATLAB session on the same machine serves as a worker. This functionality can be helpful in debugging your application prior to deploying it on a Spark enabled cluster. It is necessary to configure your MATLAB environment for interactive debugging using the MATLAB API for Spark. For more information, see Configure Environment for Interactive Debugging.

The general workflow for using the MATLAB API for Spark is as follows :

  1. Specify Spark properties.

  2. Create a SparkConf object.

  3. Create a SparkContext object.

  4. Create an RDD object from the data.

  5. Perform operations on the RDD object.

You can package an application created with this API into a standalone application using the mcc command or deploytool. You can then run the application on a Spark enabled cluster from a Linux shell.

Note

MATLAB applications developed using the MATLAB API for Spark cannot be deployed if they contain tall arrays.

For a complete example, see Example on Deploying Applications to Spark Using the MATLAB API for Spark. You can follow the same instructions to deploy applications created using the MATLAB API for Spark to Cloudera® CDH.

Classes

matlab.compiler.mlspark.SparkConfInterface class to configure an application with Spark parameters as key-value pairs
matlab.compiler.mlspark.SparkContextInterface class to initialize a connection to a Spark enabled cluster
matlab.compiler.mlspark.RDDInterface class to represent a Spark Resilient Distributed Dataset (RDD)

Topics

Configure Environment for Interactive Debugging

Configure your MATLAB environment to interactively make calls and debug your application using the MATLAB API for Spark.

Apache Spark Basics

Learn basic Apache Spark™ concepts and see how these concepts relate to deploying MATLAB applications to Spark.

Examples

Example on Deploying Applications to Spark Using the MATLAB API for Spark

Complete example showing how to deploy an application to Spark using the MATLAB API for Spark.