How can I make my custom convolutional layer (using dlconv) more memory efficient in order to improve the speed of the backward pass?

6 views (last 30 days)
Hi.
I have created a custom layer that takes a batch of 3*10 feature maps as input, giving the input size 256x256x64x30 ([Spatial, Spatial, Channel, Batch]). The layer then reshapes the input dlarray to the size 256x256x64x3x10 ([Spatial, Spatial, Channel, Time, Batch]) using the line:
Z = reshape(X{:}, [sz(1), sz(2), sz(3), numTimedims, sz(4)/numTimedims]);
This variable is called Z. Then, by separating the three channels of Z in the 4th dimension, feature addition using the results from two 2D channel-wise separable convolutional operations are performed using the following lines (doing this in a single line gave memory errors), yielding the sum Z2 of size 256x256x64x10:
Z2 = dlconv(double(squeeze(Z(:, :, :, 2, :)-Z(:, :, :, 1, :))), KiMinus, layer.bias(1), ...
'Stride', [1 1], 'Padding', 'same', 'DataFormat', 'SSCB');
Z2 = Z2 + squeeze(Z(:, :, :, 2, :));
Z2 = Z2 + dlconv(double(squeeze(Z(:, :, :, 2, :)-Z(:, :, :, 3, :))), KiPlus, layer.bias(2), ...
'Stride', [1 1], 'Padding', 'same', 'DataFormat', 'SSCB');
where KiMinus and KPlus are 3x3x1x1x64 filters (following the structure [filterHeight,filterWidth,numChannelsPerGroup,numFiltersPerGroup,numGroups], making the convolutions channel-wise separable) and layer.bias is a 2x1 array.
For the forward pass, this convolutional layer seems to work fine, not showing any significant slowness. However, the backward function is very slow. The profiler shows that 68% of the runtime of the dlfeval(@modelGradients, dlnet, dlim, dlmask)-function in my custom training loop is given by dlarray.dlgradient>RecordingArray.backwardPass>ParenReferenceOp>ParenReferenceOp.backward>internal_parenReferenceBackward, where the function dX = accumarray(linSubscripts,dZ(:),shapeX); (line 32) seems to demand the most time.
Is there any obvious way for me to improve my implementation of this convolutional layer in order to get a more memory efficient backward pass? Is there a more memory efficient way to perform the reshaping?

Answers (1)

Gautam Pendse
Gautam Pendse on 2 Apr 2021
Hi Julius,
One approach that you can try is to rewrite the code like this:
ZChannel2 = Z(:, :, :, 2, :);
Z2 = dlconv(double(squeeze(ZChannel2-Z(:, :, :, 1, :))), KiMinus, layer.bias(1), ...
'Stride', [1 1], 'Padding', 'same', 'DataFormat', 'SSCB');
Z2 = Z2 + squeeze(ZChannel2);
Z2 = Z2 + dlconv(double(squeeze(ZChannel2-Z(:, :, :, 3, :))), KiPlus, layer.bias(2), ...
'Stride', [1 1], 'Padding', 'same', 'DataFormat', 'SSCB');
This introduces an intermediate variable ZChannel2 to avoid repeatedly indexing into Z.
Does that help?
Gautam
  1 Comment
Julius Å
Julius Å on 5 Apr 2021
Hello Gautam. Thank you for your answer.
This seems to slightly improve the speed! Thanks.
However, the reshape()-function also seems fairly computationally heavy, at least in the context of the backward pass in this custom layer. As a beginner with coding things efficiently for the GPU, I don't really know how to handle this. Do you have any suggestions on how this function could be avoided or re-implemented in this context?

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!