How can I make my custom convolutional layer (using dlconv) more memory efficient in order to improve the speed of the backward pass?

Question

Julius Å on 30 Mar 2021

0
Link

Direct link to this question

https://au.mathworks.com/matlabcentral/answers/787969-how-can-i-make-my-custom-convolutional-layer-using-dlconv-more-memory-efficient-in-order-to-improv

Commented: Julius Å on 5 Apr 2021

Hi.

I have created a custom layer that takes a batch of 3*10 feature maps as input, giving the input size 256x256x64x30 ([Spatial, Spatial, Channel, Batch]). The layer then reshapes the input dlarray to the size 256x256x64x3x10 ([Spatial, Spatial, Channel, Time, Batch]) using the line:

Z = reshape(X{:}, [sz(1), sz(2), sz(3), numTimedims, sz(4)/numTimedims]);

This variable is called Z. Then, by separating the three channels of Z in the 4th dimension, feature addition using the results from two 2D channel-wise separable convolutional operations are performed using the following lines (doing this in a single line gave memory errors), yielding the sum Z2 of size 256x256x64x10:

Z2 = dlconv(double(squeeze(Z(:, :, :, 2, :)-Z(:, :, :, 1, :))), KiMinus, layer.bias(1), ...
'Stride', [1 1], 'Padding', 'same', 'DataFormat', 'SSCB');
Z2 = Z2 + squeeze(Z(:, :, :, 2, :));
Z2 = Z2 + dlconv(double(squeeze(Z(:, :, :, 2, :)-Z(:, :, :, 3, :))), KiPlus, layer.bias(2), ...
'Stride', [1 1], 'Padding', 'same', 'DataFormat', 'SSCB');

where KiMinus and KPlus are 3x3x1x1x64 filters (following the structure [filterHeight,filterWidth,numChannelsPerGroup,numFiltersPerGroup,numGroups], making the convolutions channel-wise separable) and layer.bias is a 2x1 array.

For the forward pass, this convolutional layer seems to work fine, not showing any significant slowness. However, the backward function is very slow. The profiler shows that 68% of the runtime of the dlfeval(@modelGradients, dlnet, dlim, dlmask)-function in my custom training loop is given by dlarray.dlgradient>RecordingArray.backwardPass>ParenReferenceOp>ParenReferenceOp.backward>internal_parenReferenceBackward, where the function dX = accumarray(linSubscripts,dZ(:),shapeX); (line 32) seems to demand the most time.

Is there any obvious way for me to improve my implementation of this convolutional layer in order to get a more memory efficient backward pass? Is there a more memory efficient way to perform the reshaping?

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Gautam Pendse on 2 Apr 2021

0
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/787969-how-can-i-make-my-custom-convolutional-layer-using-dlconv-more-memory-efficient-in-order-to-improv#answer_665274

Open in MATLAB Online

Hi Julius,

One approach that you can try is to rewrite the code like this:

ZChannel2 = Z(:, :, :, 2, :);
Z2 = dlconv(double(squeeze(ZChannel2-Z(:, :, :, 1, :))), KiMinus, layer.bias(1), ...
'Stride', [1 1], 'Padding', 'same', 'DataFormat', 'SSCB');
Z2 = Z2 + squeeze(ZChannel2);
Z2 = Z2 + dlconv(double(squeeze(ZChannel2-Z(:, :, :, 3, :))), KiPlus, layer.bias(2), ...
'Stride', [1 1], 'Padding', 'same', 'DataFormat', 'SSCB');

This introduces an intermediate variable ZChannel2 to avoid repeatedly indexing into Z.

Does that help?

Gautam

1 Comment
Show -1 older commentsHide -1 older comments

Julius Å on 5 Apr 2021

Hello Gautam. Thank you for your answer.

This seems to slightly improve the speed! Thanks.

However, the reshape()-function also seems fairly computationally heavy, at least in the context of the backward pass in this custom layer. As a beginner with coding things efficiently for the GPU, I don't really know how to handle this. Do you have any suggestions on how this function could be avoided or re-implemented in this context?

Sign in to comment.

How can I make my custom convolutional layer (using dlconv) more memory efficient in order to improve the speed of the backward pass?

0 Comments
Show -2 older commentsHide -2 older comments

Answers (1)

1 Comment
Show -1 older commentsHide -1 older comments

See Also

Categories

Tags

Community Treasure Hunt

How can I make my custom convolutional layer (using dlconv) more memory efficient in order to improve the speed of the backward pass?

0 Comments Show -2 older commentsHide -2 older comments

Answers (1)

1 Comment Show -1 older commentsHide -1 older comments

See Also

Categories

Tags

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

1 Comment
Show -1 older commentsHide -1 older comments