Main Content

Define Custom Pixel Classification Layer with Tversky Loss

This example shows how to define and create a custom pixel classification layer that uses Tversky loss.

This layer can be used to train semantic segmentation networks. To learn more about creating custom deep learning layers, see Define Custom Deep Learning Layers (Deep Learning Toolbox).

Tversky Loss

The Tversky loss is based on the Tversky index for measuring overlap between two segmented images [1]. The Tversky index TIc between one image Y and the corresponding ground truth T is given by

TIc=m=1MYcmTcmm=1MYcmTcm+αm=1MYcmTcm+βm=1MYcmTcm

  • c corresponds to the class and ccorresponds to not being in class c.

  • M is the number of elements along the first two dimensions of Y.

  • α and β are weighting factors that control the contribution that false positives and false negatives for each class make to the loss.

The loss Lover the number of classes C is given by

L=c=1C1-TIc

Classification Layer Template

Copy the classification layer template into a new file in MATLAB®. This template outlines the structure of a classification layer and includes the functions that define the layer behavior. The rest of the example shows how to complete the tverskyPixelClassificationLayer.

classdef tverskyPixelClassificationLayer < nnet.layer.ClassificationLayer

   properties
      % Optional properties
   end

   methods

        function loss = forwardLoss(layer, Y, T)
            % Layer forward loss function goes here
        end
        
    end
end

Declare Layer Properties

By default, custom output layers have the following properties:

  • Name – Layer name, specified as a character vector or a string scalar. To include this layer in a layer graph, you must specify a nonempty unique layer name. If you train a series network with this layer and Name is set to '', then the software automatically assigns a name at training time.

  • Description – One-line description of the layer, specified as a character vector or a string scalar. This description appears when the layer is displayed in a Layer array. If you do not specify a layer description, then the software displays the layer class name.

  • Type – Type of the layer, specified as a character vector or a string scalar. The value of Type appears when the layer is displayed in a Layer array. If you do not specify a layer type, then the software displays 'Classification layer' or 'Regression layer'.

Custom classification layers also have the following property:

  • Classes – Classes of the output layer, specified as a categorical vector, string array, cell array of character vectors, or 'auto'. If Classes is 'auto', then the software automatically sets the classes at training time. If you specify a string array or cell array of character vectors str, then the software sets the classes of the output layer to categorical(str,str). The default value is 'auto'.

If the layer has no other properties, then you can omit the properties section.

The Tversky loss requires a small constant value to prevent division by zero. Specify the property, Epsilon, to hold this value. It also requires two variable properties Alpha and Beta that control the weighting of false positives and false negatives, respectively.

classdef tverskyPixelClassificationLayer < nnet.layer.ClassificationLayer

    properties(Constant)
       % Small constant to prevent division by zero. 
       Epsilon = 1e-8;
    end

    properties
       % Default weighting coefficients for false positives and false negatives 
       Alpha = 0.5;
       Beta = 0.5;  
    end

    ...
end

Create Constructor Function

Create the function that constructs the layer and initializes the layer properties. Specify any variables required to create the layer as inputs to the constructor function.

Specify an optional input argument name to assign to the Name property at creation.

function layer = tverskyPixelClassificationLayer(name, alpha, beta)
    % layer =  tverskyPixelClassificationLayer(name) creates a Tversky
    % pixel classification layer with the specified name.
           
    % Set layer name          
    layer.Name = name;

    % Set layer properties
    layer.Alpha = alpha;
    layer.Beta = beta;

    % Set layer description
    layer.Description = 'Tversky loss';
end

Create Forward Loss Function

Create a function named forwardLoss that returns the weighted cross entropy loss between the predictions made by the network and the training targets. The syntax for forwardLoss is loss = forwardLoss(layer,Y,T), where Y is the output of the previous layer and T represents the training targets.

For semantic segmentation problems, the dimensions of T match the dimension of Y, where Y is a 4-D array of size H-by-W-by-K-by-N, where K is the number of classes, and N is the mini-batch size.

The size of Y depends on the output of the previous layer. To ensure that Y is the same size as T, you must include a layer that outputs the correct size before the output layer. For example, to ensure that Y is a 4-D array of prediction scores for K classes, you can include a fully connected layer of size K or a convolutional layer with K filters followed by a softmax layer before the output layer.

function loss = forwardLoss(layer, Y, T)
    % loss = forwardLoss(layer, Y, T) returns the Tversky loss between
    % the predictions Y and the training targets T.

    Pcnot = 1-Y;
    Gcnot = 1-T;
    TP = sum(sum(Y.*T,1),2);
    FP = sum(sum(Y.*Gcnot,1),2);
    FN = sum(sum(Pcnot.*T,1),2);

    numer = TP + layer.Epsilon;
    denom = TP + layer.Alpha*FP + layer.Beta*FN + layer.Epsilon;
    
    % Compute Tversky index
    lossTIc = 1 - numer./denom;
    lossTI = sum(lossTIc,3);
    
    % Return average Tversky index loss
    N = size(Y,4);
    loss = sum(lossTI)/N;

end

Backward Loss Function

As the forwardLoss function fully supports automatic differentiation, there is no need to create a function for the backward loss.

For a list of functions that support automatic differentiation, see List of Functions with dlarray Support (Deep Learning Toolbox).

Completed Layer

The completed layer is provided in tverskyPixelClassificationLayer.m, which is attached to the example as a supporting file.

classdef tverskyPixelClassificationLayer < nnet.layer.ClassificationLayer
    % This layer implements the Tversky loss function for training
    % semantic segmentation networks.
    
    % References
    % Salehi, Seyed Sadegh Mohseni, Deniz Erdogmus, and Ali Gholipour.
    % "Tversky loss function for image segmentation using 3D fully
    % convolutional deep networks." International Workshop on Machine
    % Learning in Medical Imaging. Springer, Cham, 2017.
    % ----------
    
    
    properties(Constant)
        % Small constant to prevent division by zero.
        Epsilon = 1e-8;
    end
    
    properties
        % Default weighting coefficients for False Positives and False
        % Negatives
        Alpha = 0.5;
        Beta = 0.5;
    end

    
    methods
        
        function layer = tverskyPixelClassificationLayer(name, alpha, beta)
            % layer =  tverskyPixelClassificationLayer(name, alpha, beta) creates a Tversky
            % pixel classification layer with the specified name and properties alpha and beta.
            
            % Set layer name.          
            layer.Name = name;
            
            layer.Alpha = alpha;
            layer.Beta = beta;
            
            % Set layer description.
            layer.Description = 'Tversky loss';
        end
        
        
        function loss = forwardLoss(layer, Y, T)
            % loss = forwardLoss(layer, Y, T) returns the Tversky loss between
            % the predictions Y and the training targets T.   

            Pcnot = 1-Y;
            Gcnot = 1-T;
            TP = sum(sum(Y.*T,1),2);
            FP = sum(sum(Y.*Gcnot,1),2);
            FN = sum(sum(Pcnot.*T,1),2); 
            
            numer = TP + layer.Epsilon;
            denom = TP + layer.Alpha*FP + layer.Beta*FN + layer.Epsilon;
            
            % Compute tversky index
            lossTIc = 1 - numer./denom;
            lossTI = sum(lossTIc,3);
            
            % Return average tversky index loss.
            N = size(Y,4);
            loss = sum(lossTI)/N;
            
        end     
    end
end

GPU Compatibility

The MATLAB functions used in forwardLoss in tverskyPixelClassificationLayer all support gpuArray inputs, so the layer is GPU compatible.

Check Output Layer Validity

Create an instance of the layer.

layer = tverskyPixelClassificationLayer('tversky',0.7,0.3);

Check the validity of the layer by using checkLayer (Deep Learning Toolbox). Specify the valid input size to be the size of a single observation of typical input to the layer. The layer expects a H-by-W-by-K-by-N array inputs, where K is the number of classes, and N is the number of observations in the mini-batch.

numClasses = 2;
validInputSize = [4 4 numClasses];
checkLayer(layer,validInputSize, 'ObservationDimension',4)
Skipping GPU tests. No compatible GPU device found.
 
Skipping code generation compatibility tests. To check validity of the layer for code generation, specify the CheckCodegenCompatibility and ObservationDimension options.
 
Running nnet.checklayer.TestOutputLayerWithoutBackward
........
Done nnet.checklayer.TestOutputLayerWithoutBackward
__________

Test Summary:
	 8 Passed, 0 Failed, 0 Incomplete, 2 Skipped.
	 Time elapsed: 1.5764 seconds.

The test summary reports the number of passed, failed, incomplete, and skipped tests.

Use Custom Layer in Semantic Segmentation Network

Create a semantic segmentation network that uses the tverskyPixelClassificationLayer.

layers = [
    imageInputLayer([32 32 1])
    convolution2dLayer(3,64,'Padding',1)
    batchNormalizationLayer
    reluLayer
    maxPooling2dLayer(2,'Stride',2)
    convolution2dLayer(3,64,'Padding',1)
    reluLayer
    transposedConv2dLayer(4,64,'Stride',2,'Cropping',1)
    convolution2dLayer(1,2)
    softmaxLayer
    tverskyPixelClassificationLayer('tversky',0.3,0.7)];

Load training data for semantic segmentation using imageDatastore and pixelLabelDatastore.

dataSetDir = fullfile(toolboxdir('vision'),'visiondata','triangleImages');
imageDir = fullfile(dataSetDir,'trainingImages');
labelDir = fullfile(dataSetDir,'trainingLabels');

imds = imageDatastore(imageDir);

classNames = ["triangle" "background"];
labelIDs = [255 0];
pxds = pixelLabelDatastore(labelDir, classNames, labelIDs);

Associate the image and pixel label data by using datastore combine.

ds = combine(imds,pxds);

Set the training options and train the network.

options = trainingOptions('adam', ...
    'InitialLearnRate',1e-3, ...
    'MaxEpochs',100, ...
    'LearnRateDropFactor',5e-1, ...
    'LearnRateDropPeriod',20, ...
    'LearnRateSchedule','piecewise', ...
    'MiniBatchSize',50);

net = trainNetwork(ds,layers,options);
Training on single CPU.
Initializing input data normalization.
|========================================================================================|
|  Epoch  |  Iteration  |  Time Elapsed  |  Mini-batch  |  Mini-batch  |  Base Learning  |
|         |             |   (hh:mm:ss)   |   Accuracy   |     Loss     |      Rate       |
|========================================================================================|
|       1 |           1 |       00:00:02 |       50.32% |       1.2933 |          0.0010 |
|      13 |          50 |       00:00:26 |       98.83% |       0.0988 |          0.0010 |
|      25 |         100 |       00:00:45 |       99.33% |       0.0547 |          0.0005 |
|      38 |         150 |       00:01:04 |       99.38% |       0.0472 |          0.0005 |
|      50 |         200 |       00:01:24 |       99.48% |       0.0401 |          0.0003 |
|      63 |         250 |       00:01:43 |       99.47% |       0.0384 |          0.0001 |
|      75 |         300 |       00:02:02 |       99.54% |       0.0349 |          0.0001 |
|      88 |         350 |       00:02:23 |       99.51% |       0.0352 |      6.2500e-05 |
|     100 |         400 |       00:02:41 |       99.56% |       0.0331 |      6.2500e-05 |
|========================================================================================|
Training finished: Max epochs completed.

Evaluate the trained network by segmenting a test image and displaying the segmentation result.

I = imread('triangleTest.jpg');
[C,scores] = semanticseg(I,net);

B = labeloverlay(I,C);
montage({I,B})

Figure contains an axes object. The axes object contains an object of type image.

References

[1] Salehi, Seyed Sadegh Mohseni, Deniz Erdogmus, and Ali Gholipour. "Tversky loss function for image segmentation using 3D fully convolutional deep networks." International Workshop on Machine Learning in Medical Imaging. Springer, Cham, 2017.