Main Content

ctc

Connectionist temporal classification (CTC) loss for unaligned sequence classification

    Description

    The CTC operation computes the connectionist temporal classification (CTC) loss between unaligned sequences.

    The ctc function computes the CTC loss between predictions and targets represented as dlarray data. Using dlarray objects makes working with high dimensional data easier by allowing you to label the dimensions. For example, you can label which dimensions correspond to spatial, time, channel, and batch dimensions using the 'S', 'T', 'C', and 'B' labels, respectively. For unspecified and other dimensions, use the 'U' label. For dlarray object functions that operate over particular dimensions, you can specify the dimension labels by formatting the dlarray object directly, or by using the 'DataFormat' option.

    example

    loss = ctc(dlY,targets,YMask,targetsMask) returns the CTC loss between the formatted dlarray object dlY containing the predictions and the target values targets using the prediction and target masks YMask and targetsMask, respectively.

    For unformatted input data, use the 'DataFormat' option.

    loss = ctc(dlY,targets,YMask,targetsMask,'DataFormat',FMT) also specifies the dimension format FMT when dlY is not a formatted dlarray.

    loss = ctc(___,Name,Value) specifies options using one or more name-value pair arguments in addition to the input arguments in previous syntaxes. For example, 'BlankIndex','last' specifies a blank index corresponding to the last element of the vocabulary.

    Examples

    collapse all

    Create an array of 2 target sequences of different lengths over 10 classes. The target sequences must not contain the blank index which is 1 by default.

    numObservations = 2;
    numClasses = 10;
    
    targets = cell(numObservations,1);
    targets{1} = [2 3 5 7 9 2 3 5 3 2 3];
    targets{2} = [2 3 3 3 4 4 4 6 8 8 8 10 3];

    Create random arrays of prediction sequences. The length of the prediction sequences must be greater than or equal to the length plus the number of repeated indices of the corresponding target sequence. In this case, the first sequence has length 11 with no repeated indices, the second sequence has length 13 with 6 repeated indices.

    Y = cell(numObservations,1);
    
    Y{1} = rand(numClasses,11);
    Y{2} = rand(numClasses,13 + 6);

    View the cell arrays of predictions and targets

    Y
    Y=2×1 cell array
        {10×11 double}
        {10×19 double}
    
    
    targets
    targets=2×1 cell array
        {[     2 3 5 7 9 2 3 5 3 2 3]}
        {[2 3 3 3 4 4 4 6 8 8 8 10 3]}
    
    

    Pad the prediction and target sequences in the second dimension using the padsequences function and also return the corresponding mask.

    [Y,YMask] = padsequences(Y,2);

    Pad the targets using the padsequences function. The targets must be positive integers between 1 and the number of classes, and must not contain the blank index, so specify a padding value of 2.

    [targets,targetsMask] = padsequences(targets,2,'PaddingValue',2);

    The ctc function requires the targets and target mask specified as 2-D arrays, remove the singleton channel dimension using the squeeze function.

    targets = squeeze(targets);
    targetsMask = squeeze(targetsMask);

    Convert the padded prediction sequences and mask to dlarray with format 'CTB' (channel, time, batch). Because formatted dlarray objects automatically sort the dimensions, keep the dimensions of the targets and mask consistent by also converting them to a formatted dlarray objects with the same formats.

    dlY = dlarray(Y,'CTB');
    YMask = dlarray(YMask,'CTB');

    Similarly, convert the padded target sequences and mask to dlarray with format 'TB' (time, batch).

    targets = dlarray(targets,'TB');
    targetsMask = dlarray(targetsMask,'TB');

    Compute the CTC loss between the predictions and the targets using the ctc function.

    loss = ctc(dlY,targets,YMask,targetsMask)
    loss = 
      1×1 dlarray
    
       12.1568
    
    

    Input Arguments

    collapse all

    Predictions, specified as a formatted dlarray, an unformatted dlarray, or a numeric array. When dlY is not a formatted dlarray, you must specify the dimension format using the 'DataFormat' option.

    The predictions dlY must have a 'B' (batch), 'C' (channel), and 'T' (time) dimension and can have different sequence lengths to the corresponding targets in targets.

    If dlY is a numeric array, then targets, YMask, or targetsMask must be a dlarray.

    Target sequences, specified as a formatted or unformatted dlarray or a numeric array.

    Specify the targets as an array with dimensions corresponding to the observations and the time steps of the target sequences. For example, specify the targets as a formatted dlarray object with format 'BT' (batch, time).

    The targets must have the same number of observations as the predictions. The target values corresponding to mask values equal to 1 must be positive integers between 1 and the number of channels of dlY and must not include the blank index.

    If targets is a formatted dlarray, its dimension format must be the same as the format of dlY, or the same as 'DataFormat' if dlY is unformatted

    If targets is an unformatted dlarray or a numeric array, then the format of dlY or the value of 'DataFormat' is implicitly applied to targets.

    Tip

    Formatted dlarray objects automatically sorts their dimensions. To ensure that the dimensions of dlY and targets are consistent, when dlY is a formatted dlarray, also specify targets as a formatted dlarray.

    Mask indicating which prediction elements to include for loss computation, specified as a dlarray object, a logical array, or a numeric array with the same size as dlY.

    The function includes and excludes elements of the predictions for loss computation when the corresponding value in the mask is 1 and 0, respectively.

    For each time-step and observation in the mask, the corresponding elements in channel dimension must be all ones or all zeros.

    Tip

    Formatted dlarray objects automatically sorts their dimensions. To ensure that the dimensions of dlY and the mask are consistent, when dlY is a formatted dlarray, also specify the mask as a formatted dlarray.

    Mask indicating which target elements to include for loss computation, specified as a dlarray object, a logical array, or a numeric array with the same size as dlY.

    The function includes and excludes elements of the targets for loss computation when the corresponding value in the mask is 1 and 0, respectively.

    Tip

    Formatted dlarray objects automatically sorts their dimensions. To ensure that the dimensions of dlY and the mask are consistent, when dlY is a formatted dlarray, also specify the mask as a formatted dlarray.

    Name-Value Pair Arguments

    Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

    Example: 'BlankIndex','last' specifies a blank index corresponding to the last element of the vocabulary

    Index of blank character, specified as the comma-separated pair consisting of 'BlankIndex' and one of the following:

    • Positive integer – Use the element in the vocabulary with the specified index as the blank character. If 'BlankIndex' is an integer, then it must between 1 and the number of channels of dlY inclusive.

    • 'last' – Use the last element of the vocabulary as the blank character.

    Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | char | string

    Dimension order of unformatted input data, specified as the comma-separated pair consisting of 'DataFormat' and a character vector or string scalar FMT that provides a label for each dimension of the data.

    When specifying the format of a dlarray object, each character provides a label for each dimension of the data and must be one of the following:

    • 'S' — Spatial

    • 'C' — Channel

    • 'B' — Batch (for example, samples and observations)

    • 'T' — Time (for example, time steps of sequences)

    • 'U' — Unspecified

    You can specify multiple dimensions labeled 'S' or 'U'. You can use the labels 'C', 'B', and 'T' at most once.

    You must specify 'DataFormat' when the input data is not a formatted dlarray.

    Example: 'DataFormat','SSCB'

    Data Types: char | string

    Output Arguments

    collapse all

    CTC loss, returned as an unformatted dlarray scalar with the same underlying data type as the input dlY.

    Extended Capabilities

    Introduced in R2021a