Main Content

Workflow to Incorporate MATLAB Map and Reduce Functions into a Hadoop Job

  1. Write mapper and reducer functions in MATLAB®.

  2. Create a MAT-file that contains a datastore that describes the structure of the data and the names of the variables to analyze. The datastore in the MAT-file can be created from a test data set that is representative of the actual data set.

  3. Create a text file that contains Hadoop® settings such as the name of the mapper, reducer, and the type of data being analyzed. This file is automatically created if you are using the Hadoop Compiler app.

  4. Use the Hadoop Compiler app or the mcc command to package the components into a deployable archive. Both options generate a deployable archive (.ctf file) that can be incorporated into a Hadoop mapreduce job.

  5. Incorporate the deployable archive into a Hadoop mapreduce job using the hadoop command and syntax.

    Execution Signature

    Key

    LetterDescription
    AHadoop command
    BJAR option
    CThe standard name of the JAR file. All applications have the same JAR: mwmapreduce.jar.The path to the JAR is also fixed relative to the MATLAB Runtime location.
    DThe standard name of the driver. All applications have the same driver name: MWMapReduceDriver
    EA generic option specifying the MATLAB Runtime location as a key-value pair.
    FDeployable archive (.ctf file) generated by the Hadoop Compiler app or mcc is passed as a payload argument to the job.
    GLocation of input files on HDFS™.
    HLocation on HDFS where output can be written.

To simplify the inclusion of the deployable archive (.ctf file) into a Hadoop mapreduce job, both the Hadoop Compiler app and the mcc command generate a shell script alongside the deployable archive. The shell script has the following naming convention: run_<deployableArchiveName>.sh

To run the deployable archive using the shell script, use the following syntax:

Related Topics