Problems with multi-gpus
5 views (last 30 days)
Show older comments
I am using this function to train a CNN:
function [trainedNet,trainingSet,testSet] = OurNetCBIR
outputFolder = fullfile('Database');
rootFolder=fullfile(outputFolder, 'Oliva');
imds = imageDatastore(fullfile(rootFolder),'IncludeSubfolders', true, 'LabelSource', 'foldernames');
imds.ReadFcn = @(filename)readAndPreprocessImage(filename);
function Iout = readAndPreprocessImage(filename)
I = imread(filename);
if ismatrix(I)
I = cat(3,I,I,I);
end
Iout = imresize(I, [227 227]);
end
[trainingSet,testSet] = splitEachLabel(imds, 0.7, 'randomize');
layers = [
imageInputLayer([227 227 3],'DataAugmentation','none') % (1)
convolution2dLayer(7,50,'Stride', 2, 'Padding', 0,'Name','Conv1') % (2)111x111x50
reluLayer('Name','ReLu1') % (3)111x111x50
maxPooling2dLayer(3,'Stride', 2,'Padding', 0,'Name','maxPooling1') % (4)55x55x50
crossChannelNormalizationLayer(5,'Alpha', 0.00002,'Beta', 0.75,'K',1,'Name','Norm1') % (5)55x55x50
convolution2dLayer(5,100,'Stride', 1, 'Padding', 2,'Name','Conv2') % (6)55x55x100
reluLayer('Name','ReLu2') % (7)55x55x100
maxPooling2dLayer(3,'Stride', 2,'Padding', 0,'Name','maxPooling2') % (8)27x27x100
crossChannelNormalizationLayer(5,'Alpha', 0.00002,'Beta', 0.75,'K',1,'Name','Norm2') % (9)27x27x100
convolution2dLayer(3,256,'Stride', 1,'Padding', 2,'Name','Conv3') % (10)27x27x256
reluLayer('Name','ReLu3') % (11)27x27x256
maxPooling2dLayer(3,'Stride', 2,'Padding', 0,'Name','maxPooling3') % (12)13x13x256
crossChannelNormalizationLayer(5,'Alpha', 0.00002,'Beta', 0.75,'K',1,'Name','Norm3') % (13)13x13x256
convolution2dLayer(3,400,'Stride', 1,'Padding', 1,'Name','Conv4') % (14)13x13x400
reluLayer('name','ReLu4') % (15)13x13x400
convolution2dLayer(3,400,'Stride', 1,'Padding', 1,'Name','Conv5') % (16)13x13x400
reluLayer('Name','ReLu5') % (17)13x13x400
convolution2dLayer(3,256,'Stride', 1,'Padding', 1,'Name','Conv6') % (18)13x13x256
reluLayer('Name','ReLu6') % (19)13x13x256
maxPooling2dLayer(3,'Stride', 2,'Padding', 0,'Name','maxPooling4') % (20)6x6x256
fullyConnectedLayer(4800,'Name','fc1') % (21)1x1x4800
reluLayer('Name','ReLu7') % (22)1x1x4800
dropoutLayer(0.5,'Name','dropout1') % (23)1x1x4800
fullyConnectedLayer(2400,'Name','fc2') % (24)1x1x2400
reluLayer('Name','ReLu8') % (25)1x1x2400
dropoutLayer(0.5,'Name','dropout2') % (26)1x1x2400
fullyConnectedLayer(8,'Name','fc3') % (27)
softmaxLayer()
classificationLayer()];
options = trainingOptions('sgdm',...
'InitialLearnRate',0.001,...
'LearnRateSchedule','piecewise',...
'LearnRateDropFactor',0.1,...
'LearnRateDropPeriod',30,...
'MaxEpochs',10,...
'Momentum',0.9,...
'L2Regularization',0.0005,...
'MiniBatchSize',25,...
'ExecutionEnvironment','gpu');
trainedNet = trainNetwork(trainingSet,layers,options);
end
I have no problem training with a single gpu, but when I try to train with multiple gpus, matlab generates the following error:
Starting parallel pool (parpool) using the 'local' profile ...
connected to 4 workers.
Error using trainNetwork (line 140)
An invalid indexing request was made.
Error in OurNetCBIR (line 110)
trainedNet = trainNetwork(trainingSet,layers,options);
Caused by:
Error using Composite/subsasgn (line 103)
An invalid indexing request was made.
Struct contents reference from a non-struct array object.
The client lost connection to worker 1. This might be due to network problems, or the interactive communicating job might have
errored.
Can someone help me please?
0 Comments
Answers (3)
Joss Knight
on 5 Nov 2017
I can reproduce your issue. It seems the issue is your use of an anonymous function to call a nested function for your datastore ReadFcn. Something about that is causing a crash when the datastore is deserialised on your pool worker (i.e. copied to it). This is a bug which we will investigate - thanks very much for bringing it to our attention.
Still, your issue is easily fixed. Reference your nested function directly rather than via an anonymous function:
imds.ReadFcn = @readAndPreprocessImage;
However, in R2017b you should be using augmentedImageSource to resize your images, since use of a ReadFcn cripples performance. This doesn't give you a way to convert grayscale images to RGB, but the best solution is to do that offline and save new files.
0 Comments
Andres Ramirez
on 19 Nov 2017
Edited: Andres Ramirez
on 19 Nov 2017
1 Comment
Joss Knight
on 19 Nov 2017
Edited: Joss Knight
on 19 Nov 2017
Well, strictly speaking this is a different question, but okay. Timeouts are a consequence of using graphics cards in WDDM mode on Windows. A quick search would give you your answer, for instance:
You can turn off timeouts, or reduce the amount of work your GPUs are doing so they don't occur.
I don't know why you're getting timeouts in multi-gpu mode but not on a single GPU. Are your other GPUs much lower powered than your main one, or are they all the same?
Andres Ramirez
on 20 Nov 2017
Edited: Andres Ramirez
on 20 Nov 2017
1 Comment
Joss Knight
on 20 Nov 2017
Edited: Joss Knight
on 20 Nov 2017
Unfortunately on Windows the delay for communication between GPUs is significant. You can only manage this by increasing the MiniBatchSize as much as possible, trying to get it to the maximum achievable with your available memory - this improves the compute/communication ratio. It depends on the hardware but it's not always possible on Windows to get multi-gpu to go faster than single GPU. The general advice is to keep the MiniBatchSize per GPU the same. You can also scale up the learning rate commensurately because a large batch size lets you train faster (although sometimes you need to 'boot' your network with a smaller learn rate at first). Also, if running Linux is an option for you that will ameliorate this issue.
The behaviour of TDR is often confusing, with timeouts not necessarily being related (it seems) to the execution time of a single kernel. I don't know why the timeouts still seem to be occurring even after you've disabled them - I've only seen this before when the user has not rebooted after changing the registry keys. Did you reboot?
The fact that one of your cards is running graphics will definitely be interfering. You could try removing it from the pool. One way to do that is to set CUDA_VISIBLE_DEVICES on MATLAB startup to ensure only the non-display cards are used:
setenv CUDA_VISIBLE_DEVICES 0,2,3
...or whatever the indexes of those cards are (noting that the indices for this environment variable are 1 less than the indices shown by gpuDevice).
See Also
Categories
Find more on Image Data Workflows in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!