# Two Issues about MATLAB's Official Example of GAN

41 views (last 30 days)
Theron FARRELL on 22 Nov 2019
Answered: wenyi shao on 13 Oct 2020
I am tring MATLAB's official example of GAN (https://www.mathworks.com/help/deeplearning/examples/train-generative-adversarial-network.html). There are a couple of issues I want to ask.
1. After setting a long epoch, say 5000, the code crashed after 3833 iterations--actually I think it is simply an arbitrary number of long iterations--with the following errors
Error using nnet.internal.cnn.dlnetwork/predict
(line 198)
Layer 'bn1': Invalid input data. The value of
'Variance' is invalid. Expected input to be
positive.
Error in dlnetwork/predict (line 205)
[varargout{1:nargout}] =
predict(net.PrivateNetwork, x,
layerIndices, layerOutputIndices);
Error in GAN_Test (line 143)
dlXGeneratedValidation = predict(dlnetGenerator,dlZValidation);
Also note that it does not happen only once, nut multiple times with an arbitrary long number of epochs. As per the error message, I think 'bn1' refers to
batchNormalizationLayer('Name','bn1')
in the generator. which takes the output from
imageInputLayer([1 1 numLatentInputs],'Normalization','none','Name','in')
transposedConv2dLayer(filterSize,8*numFilters,'Name','tconv1')
So I think that is one of the main failure modes for GAN is for the generator to collapse to a parameter setting where it always emits the same point after training the generator over many epochs-- quoted from https://arxiv.org/pdf/1606.03498.pdf
I am therefor wondering if MATLAB may issue a warning or setting variance in the code of predict() function to be always positive, say, add an eps.
2. As I mentioned in another post, the ganLoss(...) function in fact appends a Sigmoid layer at the end of the discriminator, and the loss is calculated AFTER it.
function [lossGenerator, lossDiscriminator] = ganLoss(dlYPred,dlYPredGenerated)
% Calculate losses for the discriminator network.
lossGenerated = -mean(log(1-sigmoid(dlYPredGenerated)));
lossReal = -mean(log(sigmoid(dlYPred)));
% Combine the losses for the discriminator network.
lossDiscriminator = lossReal + lossGenerated;
% Calculate the loss for the generator network.
lossGenerator = -mean(log(sigmoid(dlYPredGenerated)));
end
And yet when dlgradient(...) is adopted later, it seems to get started from the last layers of the discriminator and the generator, respectively, as it is shown in the example code
% There is NO Sigmoid layer in either of the dlnet
% There is NO Sigmoid layer in either of the dlnet
I am therefore wondering if, as per the chain rule, the loss, shall be firstly subject to the derivative of Sigmoid before it is sent back to the discriminator anc the generator, respectively. Specifically,
% Pseudo code
Final_Loss = -mean(log(sigmoid(dlYPred)));
% For one input
=-(1/sigmoid(dlYPred)) * sigmoid(dlYPred) *(1-sigmoid(dlYPred))
=sigmoid(dlYPred)
% So I reckon that the follwing should be calculated and the last two backpropagated
Loss_G2D = -mean(-sigmoid(dlYPredGen));
Loss_D2D = --mean(1-sigmoid(dlYPredReal));
Loss_D = Loss_D2D + Loss_G2D;
Loss_G = -mean(1-sigmoid(dlYPredGen));
Please do correct me if I am wrong, thanks.

Gautam Pendse on 7 Jan 2020
Hi Theron,
Re: 2. As I mentioned in another post, the ganLoss(...) function in fact appends a Sigmoid layer at the end of the discriminator, and the loss is calculated AFTER it.
function [lossGenerator, lossDiscriminator] = ganLoss(dlYPred,dlYPredGenerated)
% Calculate losses for the discriminator network.
lossGenerated = -mean(log(1-sigmoid(dlYPredGenerated)));
lossReal = -mean(log(sigmoid(dlYPred)));
% Combine the losses for the discriminator network.
lossDiscriminator = lossReal + lossGenerated;
% Calculate the loss for the generator network.
lossGenerator = -mean(log(sigmoid(dlYPredGenerated)));
end
*** Yes, the sigmoid layer is not part of the Discriminator and hence the sigmoid function is applied before loss computation. The loss for Discriminator is based on Eq. 1 in the GAN paper: https://arxiv.org/pdf/1406.2661.pdf. For the Generator, the loss is based on log(D(G(z))) rather than log(1-D(G(z))) as suggested in the paper (paragraph before Figure 1).
Re: And yet when dlgradient(...) is adopted later, it seems to get started from the last layers of the discriminator and the generator, respectively, as it is shown in the example code
% There is NO Sigmoid layer in either of the dlnet
% There is NO Sigmoid layer in either of the dlnet
I am therefore wondering if, as per the chain rule, the loss, shall be firstly subject to the derivative of Sigmoid before it is sent back to the discriminator and the generator, respectively.
*** dlgradient calculates the gradient of output variable w.r.t a set of input variables. It backpropagates through all operations needed to produce the output. So:
computes the gradient of lossDiscriminator w.r.t the Discriminator weights (the Learnables) by backpropagating through all operations that were used to calculate lossDiscriminator - this includes sigmoid as well as other operations such as mean/log:
% Calculate losses for the discriminator network.
lossGenerated = -mean(log(1-sigmoid(dlYPredGenerated)));
lossReal = -mean(log(sigmoid(dlYPred)));
% Combine the losses for the discriminator network.
lossDiscriminator = lossReal + lossGenerated;
Hope this helps,
Gautam

Theron FARRELL on 6 Jan 2020
Hi Steven,
It is neither reporting a bug nor reprimanding MATLAB's flaws. The reason I posted the question is purely for technical discussion, which I believe is what this community aims at, doesn't it? It is because I love MATLAB; it is because I am a fervent user of MATLAB since 1997, and it is because I once worked for MathWorks. It is because I do not want to see the decline of MATLAB in the era of AI and open source SW. I can use Python and Tensorflow to achieve whatever I want to achieve in ML, yet I am still trying to code my models in MATLAB.
Technical support? I was an AE before. So please let the buck stop here, take the responsibility for the good of the excellent product!

Niccolò Dal Santo on 8 Jan 2020
Hi Theron,
1. After setting a long epoch, say 5000, the code crashed after 3833 iterations--actually I think it is simply an arbitrary number of long iterations--with the following errors
You are right saying that 3833 is an arbitrary number of long iterations. This happens because training GAN is by definition a hard optimization problem which suffers of instability. This is because you are training two networks at the same time in an adversarial game, meaning that when one network improves, the performance of the other one worsens. From Improved Techniques for Training GANs, 2016.
Training GANs consists in finding a Nash equilibrium to a two-player non-cooperative game. […] Unfortunately, finding Nash equilibria is a very difficult problem. Algorithms exist for specialized cases, but we are not aware of any that are feasible to apply to the GAN game, where the cost functions are non-convex, the parameters are continuous, and the parameter space is extremely high-dimensional
There are few reasons why the training does not go well, often these involve
• the discriminator starts performing too well after few iterations and the generator is not able to fool it anymore
• mode collapse: the generator generates only one sample
If some of this happens, one of the loss functions may reach a Inf value which then propagates in the computation through a series of operations and ends up providing a NaN in the variance of the BN layer, which then triggers the error you obtained.
RE I am therefor wondering if MATLAB may issue a warning or setting variance in the code of predict() function to be always positive, say, add an eps.
If you are referring to the variance computed in the BatchNormalization (BN) layer, it has an Epsilon and you are able to set it with the 'Epsilon' name value pair when instantiating the layer.
During training I would recommend to plot the lossGenerator and lossDiscriminator as functions of the iteration count. Even though being a simple metric, it may help you to track in advance when the training is not going in a good direction, or at least provide you with a reason why you obtained the error in the BN layer.
Notice also that a larger number of training iterations is not necesseraly better: as explained before during the optimization process the two optimization processes are going one against the other (rather than having just one network trying to improve its own loss as in more usual deep learning cases). For this reason I would also recommend to save your weights every, say, 500 iterations and at the end of training visually compare the results obtained with the different models.
On top of this, there are a series of strategies (some of them purely heuristic) that you can use to improve GAN training:
• increasing the 'Epsilon' of the BN layer empirically does help to smooth the training process, you can try to increase it from the default 1.e-5 to, e.g., 5.e-5 or 1.e-4
• Add Gaussian noise to the discriminator input (both real and generated images)
• Label smoothing: as already pointed out, smoothing the labels by multiplying by a 0.9 in the computation of the discrimnator loss for real images
• Two Time-Scale Update Rule (TTUR) which simply consists in using different learning rates for generator and discriminator (being the generator learning rate smaller)
• Use Spectral normalization (https://arxiv.org/abs/1802.05957) (instead of BN), as it appears to perform very well as normalization layer. Currently this must be implemented as a Custom layer in MATLAB.
Other resources :
I hope this helps to shed some light !
Cheers,
Niccolo'
Theron FARRELL on 9 Jan 2020
Hi Niccolo',
In fact, I think that the emergence of inf or Nan in training is a rather general issue not limited to GAN only. Hence, I suppose that in custom training, as you suggested, it is a good practice to monitor the loss as well as gradients, and discard gradients' update in one training step once inf or Nan appears, which will also be useful as one tries to perform mixed precision training and quantisation (casting weights to lower precisions, scaling losses and un-scaling gradients) for deployment.

Delprat Sebastien on 31 Dec 2019
I also think that the iteration number should only be increased if the iteration takes places (obvious). This may be usefull for the Adams Optimizer that uses the iteration number.
In the original code:
iteration = iteration + 1;
...
if size(data,1) < miniBatchSize
continue
end
...
if size(data,1) < miniBatchSize
continue
end
iteration = iteration + 1;

Delprat Sebastien on 31 Dec 2019
A few other tips for improving performances.
1) One-sidded label smoothin
In order to improve the stability, it is also advised to use a one-sidded label smoothing (https://arxiv.org/pdf/1606.03498.pdf)
So the funct
function [lossGenerator, lossDiscriminator] = ganLoss(dlYPred,dlYPredGenerated)
% Calculate losses for the discriminator network.
lossGenerated = -mean(log(1-sigmoid(dlYPredGenerated)));
% Calculate losses for the discriminator network.
% ==================================================
% Mind the *0.9 here
% ==================================================
lossReal = -mean(log(0.9*sigmoid(dlYPred)));
% Combine the losses for the discriminator network.
lossDiscriminator = lossReal + lossGenerated;
% Calculate the loss for the generator network.
lossGenerator = -mean(log(sigmoid(dlYPredGenerated)));
%lossGenerator = mean(max(dlYPredGenerated,0)-dlYPred+log(1+exp(-abs(dlYPredGenerated))));
end
2) Dropout layers
I did notice that in some implementation drop out layer were used in the discriminator
3) Batch normalisation
In order to prevent numerical issue, I did modify the Epsilon values to 5e-5 within each batch normalization.
As a ressult, here is my code
clear all
close all
clc
% Sligthly modified GAN code, provided as it
% Find a unique name for backup
NoFile=1;
while isfolder(sprintf('GAN%03i',NoFile))
NoFile=NoFile+1;
end
FolderName=sprintf('GAN%03i',NoFile);
% Create unique folder that will contains networks during training + video file of the progress
mkdir(FolderName);
FileVideo=[FolderName filesep 'video' sprintf('%03i',NoFile) '.avi'];
FileData=[FolderName filesep 'data' sprintf('%03i',NoFile) ];
% Video writer for saving GAN progress
v=VideoWriter(FileVideo);
open(v);
% Folder with real images
datasetFolder= ; % ADD HERE THE FOLDER WITH YOU PICTURES
% Datastore
imds = imageDatastore(datasetFolder, ...
'IncludeSubfolders',true, ...
'LabelSource','foldernames');
% Augment the data to include random horizontal flipping, scaling, and resize
% the images to have size 64-by-64.
augmenter = imageDataAugmenter( ...
'RandXReflection',true, ...
'RandScale',[1 2]);
nx=64;
ny=64;
%% skip data augmentation since we already have a lot of images, otherwise, use the data augmentation
augimds = augmentedImageDatastore([ny ny],imds);%,'DataAugmentation',augmenter);
%% Define Generator Network
filterSize = [4 4];
numFilters = 16;
numLatentInputs = 15;
windowChannelSize=10;
layersGenerator = [
imageInputLayer([1 1 numLatentInputs],'Normalization','none','Name','in')
transposedConv2dLayer(filterSize,8*numFilters,'Name','tconv1')
batchNormalizationLayer('Name','bn1','Epsilon',5e-5)
reluLayer('Name','relu1')
transposedConv2dLayer(filterSize,4*numFilters,'Stride',2,'Cropping',1,'Name','tconv2')
batchNormalizationLayer('Name','bn2','Epsilon',5e-5)
reluLayer('Name','relu2')
transposedConv2dLayer(filterSize,2*numFilters,'Stride',2,'Cropping',1,'Name','tconv3')
batchNormalizationLayer('Name','bn3','Epsilon',5e-5)
reluLayer('Name','relu3')
transposedConv2dLayer(filterSize,numFilters,'Stride',2,'Cropping',1,'Name','tconv4')
batchNormalizationLayer('Name','bn4','Epsilon',5e-5)
reluLayer('Name','relu4')
transposedConv2dLayer(filterSize,3,'Stride',2,'Cropping',1,'Name','tconv5')
tanhLayer('Name','tanh')];
lgraphGenerator = layerGraph(layersGenerator);
%%
% To train the network with a custom training loop and enable automatic differentiation,
% convert the layer graph to a |dlnetwork| object.
dlnetGenerator = dlnetwork(lgraphGenerator)
%% Define Discriminator Network
scale = 0.2;
layersDiscriminator = [
imageInputLayer([64 64 3],'Normalization','none','Name','in')
leakyReluLayer(scale,'Name','lrelu1')
dropoutLayer(0.25,'Name','drop1')
batchNormalizationLayer('Name','bn2','Epsilon',5e-5)
leakyReluLayer(scale,'Name','lrelu2')
dropoutLayer(0.25,'Name','drop2')
batchNormalizationLayer('Name','bn3','Epsilon',5e-5)
leakyReluLayer(scale,'Name','lrelu3')
dropoutLayer(0.25,'Name','drop3')
batchNormalizationLayer('Name','bn4','Epsilon',5e-5)
leakyReluLayer(scale,'Name','lrelu4')
dropoutLayer(0.25,'Name','drop4')
convolution2dLayer(filterSize,1,'Name','conv5')];
lgraphDiscriminator = layerGraph(layersDiscriminator);
dlnetDiscriminator = dlnetwork(lgraphDiscriminator)
%% Specify Training Options
% Train with a minibatch size of 128 for 1000 epochs. For larger datasets, you
% might not need to train for as many epochs. Set the read size of the augmented
% image datastore to the mini-batch size.
numEpochs = 1500; % Larger = more risk of mode collapse
miniBatchSize = 512; % Smaller batch size = more instablility (likely to ends with a mode collapse)
augimds.MiniBatchSize = miniBatchSize;
%% Learning rate (MUST NOT BE EQUAL)
learnRateGenerator = 0.0002;
learnRateDiscriminator = 0.0001;
trailingAvgGenerator = [];
trailingAvgSqGenerator = [];
trailingAvgDiscriminator = [];
trailingAvgSqDiscriminator = [];
executionEnvironment = "auto";
%% Data for progress monitoring
NbImgValidation=16;
ZValidation = randn(1,1,numLatentInputs,NbImgValidation,'single');
dlZValidation = dlarray(ZValidation,'SSCB');
if (executionEnvironment == "auto" && canUseGPU) || executionEnvironment == "gpu"
dlZValidation = gpuArray(dlZValidation);
end
%% Train the GAN. This can take some time to run.
figure
iteration = 0;
start = tic;
ArraylossGenerator=NaN(1,1000); % Store loss for progress monitoring
ArraylossDiscriminator=NaN(1,1000);
% Loop over epochs.
for noEpoch = 1:numEpochs
if mod(noEpoch,10)==0 && noEpoch>1
% Save every 10 epoch
save([FileData '_' sprintf('%03i',noEpoch)],'numLatentInputs','dlnetGenerator','dlnetDiscriminator');
end
% Reset and shuffle datastore.
reset(augimds);
augimds = shuffle(augimds);
% Loop over mini-batches.
while hasdata(augimds)
fprintf('Epoch : %i Iter : %i mod : %i\n',noEpoch,iteration,mod(iteration,10));
% Ignore last partial mini-batch of epoch.
if size(data,1) < miniBatchSize
continue
end
% Increase iteration variable only if iteratio takes place
iteration = iteration + 1;
% Concatenate mini-batch of data and generate latent inputs for the
% generator network.
X = cat(4,data{:,1}{:});
Z = randn(1,1,numLatentInputs,size(X,4),'single');
% Normalize the images
X = (single(X)/255)*2 - 1;
% Convert mini-batch of data to dlarray specify the dimension labels
% 'SSCB' (spatial, spatial, channel, batch).
dlX = dlarray(X, 'SSCB');
dlZ = dlarray(Z, 'SSCB');
% If training on a GPU, then convert data to gpuArray.
if (executionEnvironment == "auto" && canUseGPU) || executionEnvironment == "gpu"
dlX = gpuArray(dlX);
dlZ = gpuArray(dlZ);
end
% Evaluate the model gradients and the generator state using
% dlfeval and the modelGradients function listed at the end of the
% example.
dlnetGenerator.State = stateGenerator;
% Store losses for future usage
if length(ArraylossGenerator)<iteration
ArraylossGenerator=[ArraylossGenerator NaN(1,1000)];
ArraylossDiscriminator=[ArraylossDiscriminator NaN(1,1000)];
end
ArraylossGenerator(iteration)=gather(extractdata(lossGenerator));
ArraylossDiscriminator(iteration)=gather(extractdata(lossDiscriminator));
% Update the discriminator network parameters.
[dlnetDiscriminator.Learnables,trailingAvgDiscriminator,trailingAvgSqDiscriminator] = ...
trailingAvgDiscriminator, trailingAvgSqDiscriminator, iteration, ...
% Update the generator network parameters.
[dlnetGenerator.Learnables,trailingAvgGenerator,trailingAvgSqGenerator] = ...
trailingAvgGenerator, trailingAvgSqGenerator, iteration, ...
% Every 100 iterations, display batch of generated images using the
% held-out generator input.
if mod(iteration,10) == 0 || iteration == 1
% Generate images using the held-out generator input.
dlXGeneratedValidation = predict(dlnetGenerator,dlZValidation);
% Rescale the images in the range [0 1] and display the images.
subplot(1,2,2);
I = imtile(extractdata(dlXGeneratedValidation));
I = rescale(I);
image(I)
% Update the title with training progress information.
D = duration(0,0,toc(start),'Format','hh:mm:ss');
title(...
"Epoch: " + noEpoch + ", " + ...
"Iteration: " + iteration + ", " + ...
"Elapsed: " + string(D))
if iteration==1
subplot(2,2,1);
hPlot1=plot(ArraylossGenerator);
xlabel('iteration');
title('loss generator');
subplot(2,2,3);
hPlot2=plot(ArraylossDiscriminator);
xlabel('iteration');
title('loss discriminator');
set(gcf,'position',[ 89.4000 245.4000 958.6000 516.6000]);
else
hPlot1.YData=ArraylossGenerator;
hPlot2.YData=ArraylossDiscriminator;
end
frame = getframe(gcf);
writeVideo(v,frame);
drawnow
end
end
end
close(v);
% Save all data
save(FileData);
%%
% Here, the discriminator has learned a strong feature representation that identifies
% real images among generated images and in turn, the generator has learned a
% similarly strong feature representation that allows it to generate realistic
% looking data.
%% Generate New Images
% To generate new images, use the |predict| function on the generator with a
% |dlarray| object containing a batch of 1-by-1-by-100 arrays of random values.
% To display the images together, use the |imtile| function and rescale the images
% using the |rescale| function.
%
% Create a |dlarray| object containing a batch of 16 1-by-1-by-100 arrays of
% random values to input into the generator network.
ZNew = randn(1,1,numLatentInputs,16,'single');
dlZNew = dlarray(ZNew,'SSCB');
%%
% For GPU inference, convert the data to |gpuArray| objects.
if (executionEnvironment == "auto" && canUseGPU) || executionEnvironment == "gpu"
dlZNew = gpuArray(dlZNew);
end
%%
% Generate new images using the |predict| function with the generator and the
% input data.
dlXGeneratedNew = predict(dlnetGenerator,dlZNew);
%%
% Display the images.
figure
I = imtile(extractdata(dlXGeneratedNew));
I = rescale(I);
image(I)
title("Generated Images")
% The function |modelGradients| takes generator and discriminator |dlnetwork|
% objects |dlnetGenerator| and |dlnetDiscrimintor|, a mini-batch of input data
% |X|, and an array of random values |Z|, and returns the gradients of the loss
% with respect to the learnable parameters in the networks and an array of generated
% images.
% Calculate the predictions for real data with the discriminator network.
dlYPred = forward(dlnetDiscriminator, dlX);
% Calculate the predictions for generated data with the discriminator network.
[dlXGenerated,stateGenerator] = forward(dlnetGenerator,dlZ);
dlYPredGenerated = forward(dlnetDiscriminator, dlXGenerated);
% Calculate the GAN loss
[lossGenerator, lossDiscriminator] = ganLoss(dlYPred,dlYPredGenerated);
% For each network, calculate the gradients with respect to the loss.
end
%% GAN Loss Function
% The objective of the generator is to generate data that the discriminator
% classifies as "real". To maximize the probability that images from the generator
% are classified as real by the discriminator, minimize the negative log likelihood
% function. The loss function for the generator is given by
%
% $$\textrm{lossGenerator}=-\textrm{mean}\left(\log \left(\sigma \left({\hat{Y} % }_{\textrm{Generated}} \right)\right)\right),$$
%
% where $\sigma$ denotes the sigmoid function, and $\hat{Y}_{Generated}$ denotes
% the output of the discriminator with generated data input.
%
% The objective of the discriminator is to not be "fooled" by the generator.
% To maximize the probability that the discriminator successfully discriminates
% between the real and generated images, minimize the sum of the corresponding
% negative log likelihood functions. The output of the discriminator corresponds
% to the probabilities the input belongs to the "real" class. For the generated
% data, to use the probabilities corresponding to the "generated" class, use the
% values $1-\sigma(\hat{Y}_{Generated})$. The loss function for the discriminator
% is given by
%
% $$\textrm{lossDiscriminator}=-\textrm{mean}\left(\log \left(\sigma \left({\hat{Y} % }_{\textrm{Real}} \right)\right)\right)-\textrm{mean}\left(\log \left(1-\sigma % \left({\hat{Y} }_{\textrm{Generated}} \right)\right)\right),$$
%
% where $\hat{Y}_{Real}$ denotes the output of the discriminator with real data
% input.
function [lossGenerator, lossDiscriminator] = ganLoss(dlYPred,dlYPredGenerated)
% Calculate losses for the discriminator network.
lossGenerated = -mean(log(1-sigmoid(dlYPredGenerated)));
% Calculate losses for the discriminator network.
% ==================================================
% Mind the *0.9 here
% ==================================================
lossReal = -mean(log(0.9*sigmoid(dlYPred)));
% Combine the losses for the discriminator network.
lossDiscriminator = lossReal + lossGenerated;
% Calculate the loss for the generator network.
lossGenerator = -mean(log(sigmoid(dlYPredGenerated)));
%lossGenerator = mean(max(dlYPredGenerated,0)-dlYPred+log(1+exp(-abs(dlYPredGenerated))));
end
%% References
%%
%%
% _Copyright 2019 The MathWorks, Inc._
##### 2 CommentsShowHide 1 older comment
Steven Lord on 1 Jan 2020
If you believe you have found an error in an example and want to officially report it to MathWorks, you should contact Technical Support directly using the telephone icon in the upper-right corner of this page.

Theron FARRELL on 8 Jan 2020
Hi Gautam,
Thanks a lot for your articulating answer and great help! All puzzle cleared!
Anyway, concerning custom training, I also have two other questions here and here. Would you mind of taking a look? Cheers!

wenyi shao on 13 Oct 2020
I met the same problem "varience expected to be positive" when using 2019b last year, and using 2020b this year.
I contacted the technical services as well but they couldn't provide effective answer. So far, a quick way I have tried is to test with different mini-batch size. Then this problem disappears, but I don't know the underlied reason.

R2019b

### Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!