Main Content

Set up MATLAB Job Scheduler Cluster for Auto-Resizing

You can customize your MATLAB® Job Scheduler (MJS) cluster to resize automatically. By default, an MJS cluster does not have the resizing functionality enabled. This means that MJS immediately rejects any work you submit to the cluster that requires more than the current number of workers in the cluster. Auto-resizing, also called auto-scaling, allows you to submit such work to the cluster and makes the number of workers in the cluster change automatically with the amount of work submitted. The cluster grows (scales up) when there is more work to do and shrinks (scales down) when there is less work to do. This allows you to use your compute resources more efficiently and can result in cost savings.

To configure your MJS cluster to resize automatically, you need to:

  1. Set the maximum number of workers in the mjs_def file.

  2. Start an MJS cluster.

  3. Set up an auto-resizing process.

Set Maximum Number of Workers

To make an MJS cluster resizable, you need to define the maximum number of workers of your cluster by editing the mjs_def file as follows:

  1. Open the file mjs_def.sh (on Linux®) or mjs_def.bat (on Windows®) located at matlabroot/toolbox/parallel/bin, where matlabroot is the directory of your MATLAB installation.

  2. Uncomment one or both of the lines #MAX_LINUX_WORKERS= and #MAX_WINDOWS_WORKERS= and set them to the desired values. These variables define the maximum number of Linux and Windows workers to which you can resize the cluster, respectively.

A resizable MJS cluster allows jobs in the queue that require more than the current number of workers in the cluster, up to the amount specified in MAX_LINUX_WORKERS and MAX_WINDOWS_WORKERS. Other jobs are cancelled immediately.

Tip

In the mjs_def file, you can also specify a scheduling algorithm that works well with a resizable MJS cluster, such as the standard scheduling algorithm. For more details, see the definition for the SCHEDULING_ALGORITHM parameter in Define MATLAB Job Scheduler Startup Parameters.

Start MJS Cluster

To create a cluster with the options defined in the mjs_def file, start an MJS cluster after editing and saving this file. For more information about how to install, configure and start an MJS cluster, see Install for MATLAB Job Scheduler with Network License Manager.

Note

To change the maximum number of Linux and Windows workers after you start the cluster, use the resize script located at matlabroot/toolbox/parallel/bin to run the resize update command. For example:

% cd matlabroot/toolbox/parallel/bin
% ./resize update -jobmanager myJobManager -maxlinuxworkers 4 -maxwindowsworkers 8

Set up Auto-Resizing Process

To make a resizable MJS cluster change size automatically, you must set up a background process to periodically adjust the size of the cluster. The specific implementation of this background process depends on many factors, but you can follow these general recommended steps:

  1. Identify the desired size of the cluster. The desired size of a resizable MJS cluster is reported as the total number of workers for each operating system and hence includes all busy workers and some idle workers that are already in the cluster. The desired size changes based on running jobs and jobs in the queue. Use the resize script located at matlabroot/toolbox/parallel/bin to run the resize status command:

    % cd matlab/toolbox/parallel/bin
    % ./resize status
    The resize status command above returns information about the resizable cluster in JSON format:
    {
      "jobManagers": [
        {
          "name": "myJobManager",
          "host": "myhostname",
          "desiredWorkers": {
            "linux": 1,
            "windows": 0
          },
          "maxWorkers": {
            "linux": 4,
            "windows": 8,
          },
          "workers": [
            {
              "name": "worker_1",
              "host": "myhostname",
              "operatingSystem": "linux",
              "state": "busy",
              "secondsIdle": 0
            },
            {
              "name": "worker_2",
              "host": "myhostname",
              "operatingSystem": "linux",
              "state": "idle",
              "secondsIdle": 60
            }
          ]
        }
      ]
    }
    Parse the JSON output to extract the desiredWorkers values that represent the desired number of Linux and Windows workers for the MJS cluster.

  2. Compare the desired number of workers with the workers in the cluster to decide whether you need to start or stop workers. Use the workers array in the output of the resize status command to examine the workers in the cluster. To ensure that jobs in the queue eventually run, you must start enough workers to match or exceed the desired number of workers. You can optionally stop idle workers that exceed the desired number of workers.

    Note

    If workers take a long time to start in your environment, you might want to wait for excess workers to be idle for some time before stopping them. This approach can be more efficient than immediately stopping excess idle workers if they are needed again soon after they become idle. To check how long a worker has been idle, examine the secondsIdle value for the worker.

  3. Start or stop workers as necessary. To do this, use the startworker and stopworker utility scripts. To avoid interrupting any work when stopping workers, it is recommended that you use the -onidle flag with the stopworker command.

See Also

| |

Related Topics