Working with Codistributed Arrays

How MATLAB Software Distributes Arrays

When you distribute an array to a number of workers, MATLAB^® software partitions the array into segments and assigns one segment of the array to each worker. You can partition a two-dimensional array horizontally, assigning columns of the original array to the different workers, or vertically, by assigning rows. An array with N dimensions can be partitioned along any of its N dimensions. You choose which dimension of the array is to be partitioned by specifying it in the array constructor command.

For example, to distribute an 80-by-1000 array to four workers, you can partition it either by columns, giving each worker an 80-by-250 segment, or by rows, with each worker getting a 20-by-1000 segment. If the array dimension does not divide evenly over the number of workers, MATLAB partitions it as evenly as possible.

The following example creates an 80-by-1000 replicated array and assigns it to variable A. In doing so, each worker creates an identical array in its own workspace and assigns it to variable A, where A is local to that worker. The second command distributes A, creating a single 80-by-1000 array D that spans all four workers. Worker 1 stores columns 1 through 250, worker 2 stores columns 251 through 500, and so on. The default distribution is by the last nonsingleton dimension, thus, columns in this case of a 2-dimensional array.

spmd
  A = zeros(80, 1000);
  D = codistributed(A)
end

    Worker 1: This worker stores D(:,1:250).
    Worker 2: This worker stores D(:,251:500).
    Worker 3: This worker stores D(:,501:750).
    Worker 4: This worker stores D(:,751:1000).

Each worker has access to all segments of the array. Access to the local segment is faster than to a remote segment, because the latter requires sending and receiving data between workers and thus takes more time.

How MATLAB Displays a Codistributed Array

For each worker, the MATLAB Parallel Command Window displays information about the codistributed array, the local portion, and the codistributor. For example, an 8-by-8 identity matrix codistributed among four workers, with two columns on each worker, displays like this:

>> spmd
II = eye(8,"codistributed")
end

Worker 1: 
  This worker stores II(:,1:2).
          LocalPart: [8x2 double]
      Codistributor: [1x1 codistributor1d]
Worker 2: 
  This worker stores II(:,3:4).
          LocalPart: [8x2 double]
      Codistributor: [1x1 codistributor1d]
Worker 3: 
  This worker stores II(:,5:6).
          LocalPart: [8x2 double]
      Codistributor: [1x1 codistributor1d]
Worker 4: 
  This worker stores II(:,7:8).
          LocalPart: [8x2 double]
      Codistributor: [1x1 codistributor1d]

To see the actual data in the local segment of the array, use the getLocalPart function.

How Much Is Distributed to Each Worker

In distributing an array of N rows, if N is evenly divisible by the number of workers, MATLAB stores the same number of rows (N/spmdSize) on each worker. When this number is not evenly divisible by the number of workers, MATLAB partitions the array as evenly as possible.

MATLAB provides codistributor object properties called Dimension and Partition that you can use to determine the exact distribution of an array. See Indexing into a Codistributed Array for more information on indexing with codistributed arrays.

Distribution of Other Data Types

You can distribute arrays of any MATLAB built-in data type, and also numeric arrays that are complex or sparse, but not arrays of function handles or object types.

Creating a Codistributed Array

You can create a codistributed array in any of the following ways:

Partitioning a Larger Array — Start with a large array that is replicated on all workers, and partition it so that the pieces are distributed across the workers. This is most useful when you have sufficient memory to store the initial replicated array.
Building from Smaller Arrays — Start with smaller variant or replicated arrays stored on each worker, and combine them so that each array becomes a segment of a larger codistributed array. This method reduces memory requirements as it lets you build a codistributed array from smaller pieces.
Using MATLAB Constructor Functions — Use any of the MATLAB constructor functions like rand or zeros with a codistributor object argument. These functions offer a quick means of constructing a codistributed array of any size in just one step.

Partitioning a Larger Array

If you have a large array already in memory that you want MATLAB to process more quickly, you can partition it into smaller segments and distribute these segments to all of the workers using the codistributed function. Each worker then has an array that is a fraction the size of the original, thus reducing the time required to access the data that is local to each worker.

As a simple example, the following line of code creates a 4-by-8 replicated matrix on each worker assigned to the variable A:

spmd, A = [11:18; 21:28; 31:38; 41:48], end

A =
    11    12    13    14    15    16    17    18
    21    22    23    24    25    26    27    28
    31    32    33    34    35    36    37    38
    41    42    43    44    45    46    47    48

The next line uses the codistributed function to construct a single 4-by-8 matrix D that is distributed along the second dimension of the array:

spmd
    D = codistributed(A);
    getLocalPart(D)
end

1: Local Part  | 2: Local Part  | 3: Local Part  | 4: Local Part
    11    12   |     13    14   |     15    16   |     17    18
    21    22   |     23    24   |     25    26   |     27    28
    31    32   |     33    34   |     35    36   |     37    38
    41    42   |     43    44   |     45    46   |     47    48

Arrays A and D are the same size (4-by-8). Array A exists in its full size on each worker, while only a segment of array D exists on each worker.

spmd, size(A), size(D), end

Examining the variables in the client workspace, an array that is codistributed among the workers inside an spmd statement, is a distributed array from the perspective of the client outside the spmd statement. Variables that are not codistributed inside the spmd are Composites in the client outside the spmd.

whos  
  Name        Size               Bytes  Class                   Attributes

  A           1x4                  489  Composite                         
  D           4x8                  256  distributed

See the codistributed function reference page for syntax and usage information.

Building from Smaller Arrays

The codistributed function is less useful for reducing the amount of memory required to store data when you first construct the full array in one workspace and then partition it into distributed segments. To save on memory, you can construct the smaller pieces (local part) on each worker first, and then use codistributed.build to combine them into a single array that is distributed across the workers.

This example creates a 4-by-250 variant array A on each of four workers and then uses codistributor to distribute these segments across four workers, creating a 16-by-250 codistributed array. Here is the variant array, A:

spmd
  A = [1:250; 251:500; 501:750; 751:1000] + 250 * (spmdIndex - 1);
end

    WORKER 1             WORKER 2             WORKER 3
  1    2 ... 250 |  251   252 ... 500 |  501   502 ... 750 | etc.
251  252 ... 500 |  501   502 ... 750 |  751   752 ...1000 | etc.
501  502 ... 750 |  751   752 ...1000 | 1001  1002 ...1250 | etc.
751  752 ...1000 | 1001  1002 ...1250 | 1251  1252 ...1500 | etc.
                 |                    |                    |

Now combine these segments into an array that is distributed by the first dimension (rows). The array is now 16-by-250, with a 4-by-250 segment residing on each worker:

spmd
  D = codistributed.build(A, codistributor1d(1,[4 4 4 4],[16 250]))
end

Worker 1: 
    This worker stores D(1:4,:).
           LocalPart: [4x250 double]
      Codistributor: [1x1 codistributor1d]

whos   
Name       Size             Bytes  Class          Attributes

A          1x4                489  Composite                
D         16x250            32000  distributed

You could also use replicated arrays in the same fashion, if you wanted to create a codistributed array whose segments were all identical to start with. See the codistributed function reference page for syntax and usage information.

Using MATLAB Constructor Functions

MATLAB provides several array constructor functions that you can use to build codistributed arrays of specific values, sizes, and classes. These functions operate in the same way as their nondistributed counterparts in the MATLAB language, except that they distribute the resultant array across the workers using the specified codistributor object, codist.

Constructor Functions. The codistributed constructor functions are listed here. Use the codist argument (created by the codistributor function: codist=codistributor()) to specify over which dimension to distribute the array. See the individual reference pages for these functions for further syntax and usage information.

eye(___,codist)
false(___,codist)
Inf(___,codist)
NaN(___,codist)
ones(___,codist)
rand(___,codist)
randi(___,codist)
randn(___,codist)
true(___,codist)
zeros(___,codist)

codistributed.cell(m,n,...,codist)
codistributed.colon(a,d,b)
codistributed.linspace(m,n,...,codist)
codistributed.logspace(m,n,...,codist)
sparse(m,n,codist)
codistributed.speye(m,...,codist)
codistributed.sprand(m,n,density,codist)
codistributed.sprandn(m,n,density,codist)

Local Arrays

That part of a codistributed array that resides on each worker is a piece of a larger array. Each worker can work on its own segment of the common array, or it can make a copy of that segment in a variant or private array of its own. This local copy of a codistributed array segment is called a local array.

Creating Local Arrays from a Codistributed Array

The getLocalPart function copies the segments of a codistributed array to a separate variant array. This example makes a local copy L of each segment of codistributed array D. The size of L shows that it contains only the local part of D for each worker. Suppose you distribute an array across four workers:

spmd(4)
    A = [1:80; 81:160; 161:240];
    D = codistributed(A);
    size(D)
       L = getLocalPart(D);
    size(L)
end

returns on each worker:

3    80
3    20

Each worker recognizes that the codistributed array D is 3-by-80. However, notice that the size of the local part, L, is 3-by-20 on each worker, because the 80 columns of D are distributed over four workers.

Creating a Codistributed from Local Arrays

Use the codistributed.build function to perform the reverse operation. This function, described in Building from Smaller Arrays, combines the local variant arrays into a single array distributed along the specified dimension.

Continuing the previous example, take the local variant arrays L and put them together as segments to build a new codistributed array X.

spmd
  codist = codistributor1d(2,[20 20 20 20],[3 80]);
  X = codistributed.build(L,codist);
  size(X)
end

returns on each worker:

3    80

Obtaining information About the Array

MATLAB offers several functions that provide information on any particular array. In addition to these standard functions, there are also two functions that are useful solely with codistributed arrays.

Determining Whether an Array Is Codistributed

The iscodistributed function returns a logical 1 (true) if the input array is codistributed, and logical 0 (false) otherwise. The syntax is

spmd, TF = iscodistributed(D), end

where D is any MATLAB array.

Determining the Dimension of Distribution

The codistributor object determines how an array is partitioned and its dimension of distribution. To access the codistributor of an array, use the getCodistributor function. This returns two properties, Dimension and Partition:

spmd, getCodistributor(X), end

     Dimension: 2
     Partition: [20 20 20 20]

The Dimension value of 2 means the array X is distributed by columns (dimension 2); and the Partition value of [20 20 20 20] means that twenty columns reside on each of the four workers.

To get these properties programmatically, return the output of getCodistributor to a variable, then use dot notation to access each property:

spmd
    C = getCodistributor(X);
    part = C.Partition
    dim  = C.Dimension
end

Other Array Functions

Other functions that provide information about standard arrays also work on codistributed arrays and use the same syntax.

length — Returns the length of a specific dimension.
ndims — Returns the number of dimensions.
numel — Returns the number of elements in the array.
size — Returns the size of each dimension.
is* — Many functions that have names beginning with 'is', such as ischar and issparse.

Changing the Dimension of Distribution

When constructing an array, you distribute the parts of the array along one of the array's dimensions. You can change the direction of this distribution on an existing array using the redistribute function with a different codistributor object.

Construct an 8-by-16 codistributed array D of random values distributed by columns on four workers:

spmd
    D = rand(8,16,codistributor());
    size(getLocalPart(D))
end

returns on each worker:

8     4

Create a new codistributed array distributed by rows from an existing one already distributed by columns:

spmd
    X = redistribute(D, codistributor1d(1));
    size(getLocalPart(X))
end

returns on each worker:

2    16

Restoring the Full Array

You can restore a codistributed array to its undistributed form using the gather function. gather takes the segments of an array that reside on different workers and combines them into a replicated array on all workers, or into a single array on one worker.

Distribute a 4-by-10 array to four workers along the second dimension:

spmd,  A = [11:20; 21:30; 31:40; 41:50],  end

A =
    11    12    13    14    15    16    17    18    19    20
    21    22    23    24    25    26    27    28    29    30
    31    32    33    34    35    36    37    38    39    40
    41    42    43    44    45    46    47    48    49    50

spmd,  D = codistributed(A),  end


      WORKER 1        WORKER 2      WORKER 3     WORKER 4
    11   12   13  | 14   15   16  |  17   18  |  19    20
    21   22   23  | 24   25   26  |  27   28  |  29    30
    31   32   33  | 34   35   36  |  37   38  |  39    40
    41   42   43  | 44   45   46  |  47   48  |  49    50
                  |               |           |

spmd,  size(getLocalPart(D)),  end
Worker 1: 
    4     3
Worker 2: 
    4     3
Worker 3: 
    4     2
Worker 4: 
    4     2

Restore the undistributed segments to the full array form by gathering the segments:

spmd,  X = gather(D),  end

X =
    11    12    13    14    15    16    17    18    19    20
    21    22    23    24    25    26    27    28    29    30
    31    32    33    34    35    36    37    38    39    40
    41    42    43    44    45    46    47    48    49    50

spmd,  size(X),  end
    4    10

Indexing into a Codistributed Array

While indexing into a nondistributed array is fairly straightforward, codistributed arrays require additional considerations. Each dimension of a nondistributed array is indexed within a range of 1 to the final subscript, which is represented in MATLAB by the end keyword. The length of any dimension can be easily determined using either the size or length function.

With codistributed arrays, these values are not so easily obtained. For example, the second segment of an array (that which resides in the workspace of worker 2) has a starting index that depends on the array distribution. For a 200-by-1000 array with a default distribution by columns over four workers, the starting index on worker 2 is 251. For a 1000-by-200 array also distributed by columns, that same index would be 51. As for the ending index, this is not given by using the end keyword, as end in this case refers to the end of the entire array; that is, the last subscript of the final segment. The length of each segment is also not given by using the length or size functions, as they only return the length of the entire array.

The MATLAB colon operator and end keyword are two of the basic tools for indexing into nondistributed arrays. For codistributed arrays, MATLAB provides a version of the colon operator, called codistributed.colon. This actually is a function, not a symbolic operator like colon.

Note

When using arrays to index into codistributed arrays, you can use only replicated or codistributed arrays for indexing. The toolbox does not check to ensure that the index is replicated, as that would require global communications. Therefore, the use of unsupported variants (such as spmdIndex) to index into codistributed arrays might create unexpected results.

Example: Find a Particular Element in a Codistributed Array

Suppose you have a row vector of 1 million elements, distributed among several workers, and you want to locate its element number 225,000. That is, you want to know what worker contains this element, and in what position in the local part of the vector on that worker. The globalIndices function provides a correlation between the local and global indexing of the codistributed array.

D = rand(1,1e6,"distributed"); %Distributed by columns
spmd
    globalInd = globalIndices(D,2);
    pos = find(globalInd == 225e3);
    if ~isempty(pos)
      fprintf(...
      'Element is in position %d on worker %d.\n', pos, spmdIndex);
    end
end

If you run this code on a pool of four workers you get this result:

Worker 1: 
  Element is in position 225000 on worker 1.

If you run this code on a pool of five workers you get this result:

Worker 2: 
  Element is in position 25000 on worker 2.

Notice if you use a pool of a different size, the element ends up in a different location on a different worker, but the same code can be used to locate the element.

2-Dimensional Distribution

As an alternative to distributing by a single dimension of rows or columns, you can distribute a matrix by blocks using '2dbc' or two-dimensional block-cyclic distribution. Instead of segments that comprise a number of complete rows or columns of the matrix, the segments of the codistributed array are 2-dimensional square blocks.

For example, consider a simple 8-by-8 matrix with ascending element values. You can create this array in an spmd statement or communicating job.

spmd
    A = reshape(1:64, 8, 8)
end

The result is the replicated array:

     1     9    17    25    33    41    49    57

     2    10    18    26    34    42    50    58

     3    11    19    27    35    43    51    59

     4    12    20    28    36    44    52    60

     5    13    21    29    37    45    53    61

     6    14    22    30    38    46    54    62

     7    15    23    31    39    47    55    63

     8    16    24    32    40    48    56    64

Suppose you want to distribute this array among four workers, with a 4-by-4 block as the local part on each worker. In this case, the worker grid is a 2-by-2 arrangement of the workers, and the block size is a square of four elements on a side (i.e., each block is a 4-by-4 square). With this information, you can define the codistributor object:

spmd
    DIST = codistributor2dbc([2 2], 4);
end

Now you can use this codistributor object to distribute the original matrix:

spmd
    AA = codistributed(A, DIST)
end

This distributes the array among the workers according to this scheme:

4-by-4 block of array distributed over a 2-by-2 worker grid. Resultant block size is a square of four elements on a side.

If the worker grid does not perfectly overlay the dimensions of the codistributed array, you can still use '2dbc' distribution, which is block cyclic. In this case, you can imagine the worker grid being repeatedly overlaid in both dimensions until all the original matrix elements are included.

Using the same original 8-by-8 matrix and 2-by-2 worker grid, consider a block size of 3 instead of 4, so that 3-by-3 square blocks are distributed among the workers. The code looks like this:

spmd
    DIST = codistributor2dbc([2 2], 3)
    AA = codistributed(A, DIST)
end

The first “row” of the worker grid is distributed to worker 1 and worker 2, but that contains only six of the eight columns of the original matrix. Therefore, the next two columns are distributed to worker 1. This process continues until all columns in the first rows are distributed. Then a similar process applies to the rows as you proceed down the matrix, as shown in the following distribution scheme:

The diagram above shows a scheme that requires four overlays of the worker grid to accommodate the entire original matrix. The following code shows the resulting distribution of data to each of the workers.

spmd
    getLocalPart(AA)
end

Worker 1: 
  
  ans =
  
       1     9    17    49    57
       2    10    18    50    58
       3    11    19    51    59
       7    15    23    55    63
       8    16    24    56    64
  
Worker 2: 
  
  ans =
  
      25    33    41
      26    34    42
      27    35    43
      31    39    47
      32    40    48
  
Worker 3: 
  
  ans =
  
       4    12    20    52    60
       5    13    21    53    61
       6    14    22    54    62
  
Worker 4: 
  
  ans =
  
      28    36    44
      29    37    45
      30    38    46

The following points are worth noting:

'2dbc' distribution might not offer any performance enhancement unless the block size is at least a few dozen. The default block size is 64.
The worker grid should be as close to a square as possible.
Not all functions that are enhanced to work on '1d' codistributed arrays work on '2dbc' codistributed arrays.