42 views (last 30 days)

Show older comments

I am tring MATLAB's official example of GAN (https://www.mathworks.com/help/deeplearning/examples/train-generative-adversarial-network.html). There are a couple of issues I want to ask.

- After setting a long epoch, say 5000, the code crashed after 3833 iterations--actually I think it is simply an arbitrary number of long iterations--with the following errors

Error using nnet.internal.cnn.dlnetwork/predict

(line 198)

Layer 'bn1': Invalid input data. The value of

'Variance' is invalid. Expected input to be

positive.

Error in dlnetwork/predict (line 205)

[varargout{1:nargout}] =

predict(net.PrivateNetwork, x,

layerIndices, layerOutputIndices);

Error in GAN_Test (line 143)

dlXGeneratedValidation = predict(dlnetGenerator,dlZValidation);

Also note that it does not happen only once, nut multiple times with an arbitrary long number of epochs. As per the error message, I think 'bn1' refers to

batchNormalizationLayer('Name','bn1')

in the generator. which takes the output from

imageInputLayer([1 1 numLatentInputs],'Normalization','none','Name','in')

transposedConv2dLayer(filterSize,8*numFilters,'Name','tconv1')

So I think that is one of the main failure modes for GAN is for the generator to collapse to a parameter setting where it always emits the same point after training the generator over many epochs-- quoted from https://arxiv.org/pdf/1606.03498.pdf

I am therefor wondering if MATLAB may issue a warning or setting variance in the code of predict() function to be always positive, say, add an eps.

2. As I mentioned in another post, the ganLoss(...) function in fact appends a Sigmoid layer at the end of the discriminator, and the loss is calculated AFTER it.

function [lossGenerator, lossDiscriminator] = ganLoss(dlYPred,dlYPredGenerated)

% Calculate losses for the discriminator network.

lossGenerated = -mean(log(1-sigmoid(dlYPredGenerated)));

lossReal = -mean(log(sigmoid(dlYPred)));

% Combine the losses for the discriminator network.

lossDiscriminator = lossReal + lossGenerated;

% Calculate the loss for the generator network.

lossGenerator = -mean(log(sigmoid(dlYPredGenerated)));

end

And yet when dlgradient(...) is adopted later, it seems to get started from the last layers of the discriminator and the generator, respectively, as it is shown in the example code

% There is NO Sigmoid layer in either of the dlnet

% There is NO Sigmoid layer in either of the dlnet

gradientsGenerator = dlgradient(lossGenerator, dlnetGenerator.Learnables,'RetainData',true);

gradientsDiscriminator = dlgradient(lossDiscriminator, dlnetDiscriminator.Learnables);

I am therefore wondering if, as per the chain rule, the loss, shall be firstly subject to the derivative of Sigmoid before it is sent back to the discriminator anc the generator, respectively. Specifically,

% Pseudo code

Final_Loss = -mean(log(sigmoid(dlYPred)));

% For one input

Del(Final_Loss)/Del(dlYRead)

=Del(Final_Loss)/Del(log(sigmoid(dlYPred))) * Del(log(sigmoid(dlYPred)))/Del(dlYRead)

=-(1/sigmoid(dlYPred)) * sigmoid(dlYPred) *(1-sigmoid(dlYPred))

=sigmoid(dlYPred)

% So I reckon that the follwing should be calculated and the last two backpropagated

Loss_G2D = -mean(-sigmoid(dlYPredGen));

Loss_D2D = --mean(1-sigmoid(dlYPredReal));

Loss_D = Loss_D2D + Loss_G2D;

Loss_G = -mean(1-sigmoid(dlYPredGen));

Please do correct me if I am wrong, thanks.

Gautam Pendse
on 7 Jan 2020

Hi Theron,

Re: 2. As I mentioned in another post, the ganLoss(...) function in fact appends a Sigmoid layer at the end of the discriminator, and the loss is calculated AFTER it.

function [lossGenerator, lossDiscriminator] = ganLoss(dlYPred,dlYPredGenerated)

% Calculate losses for the discriminator network.

lossGenerated = -mean(log(1-sigmoid(dlYPredGenerated)));

lossReal = -mean(log(sigmoid(dlYPred)));

% Combine the losses for the discriminator network.

lossDiscriminator = lossReal + lossGenerated;

% Calculate the loss for the generator network.

lossGenerator = -mean(log(sigmoid(dlYPredGenerated)));

end

*** Yes, the sigmoid layer is not part of the Discriminator and hence the sigmoid function is applied before loss computation. The loss for Discriminator is based on Eq. 1 in the GAN paper: https://arxiv.org/pdf/1406.2661.pdf. For the Generator, the loss is based on log(D(G(z))) rather than log(1-D(G(z))) as suggested in the paper (paragraph before Figure 1).

Re: And yet when dlgradient(...) is adopted later, it seems to get started from the last layers of the discriminator and the generator, respectively, as it is shown in the example code

% There is NO Sigmoid layer in either of the dlnet

% There is NO Sigmoid layer in either of the dlnet

gradientsGenerator = dlgradient(lossGenerator, dlnetGenerator.Learnables,'RetainData',true);

gradientsDiscriminator = dlgradient(lossDiscriminator, dlnetDiscriminator.Learnables);

I am therefore wondering if, as per the chain rule, the loss, shall be firstly subject to the derivative of Sigmoid before it is sent back to the discriminator and the generator, respectively.

*** dlgradient calculates the gradient of output variable w.r.t a set of input variables. It backpropagates through all operations needed to produce the output. So:

gradientsDiscriminator = dlgradient(lossDiscriminator, dlnetDiscriminator.Learnables);

computes the gradient of lossDiscriminator w.r.t the Discriminator weights (the Learnables) by backpropagating through all operations that were used to calculate lossDiscriminator - this includes sigmoid as well as other operations such as mean/log:

% Calculate losses for the discriminator network.

lossGenerated = -mean(log(1-sigmoid(dlYPredGenerated)));

lossReal = -mean(log(sigmoid(dlYPred)));

% Combine the losses for the discriminator network.

lossDiscriminator = lossReal + lossGenerated;

Hope this helps,

Gautam

Niccolò Dal Santo
on 8 Jan 2020

Hi Theron,

- After setting a long epoch, say 5000, the code crashed after 3833 iterations--actually I think it is simply an arbitrary number of long iterations--with the following errors

You are right saying that 3833 is an arbitrary number of long iterations. This happens because training GAN is by definition a hard optimization problem which suffers of instability. This is because you are training two networks at the same time in an adversarial game, meaning that when one network improves, the performance of the other one worsens. From Improved Techniques for Training GANs, 2016.

Training GANs consists in finding a Nash equilibrium to a two-player non-cooperative game. […] Unfortunately, finding Nash equilibria is a very difficult problem. Algorithms exist for specialized cases, but we are not aware of any that are feasible to apply to the GAN game, where the cost functions are non-convex, the parameters are continuous, and the parameter space is extremely high-dimensional

There are few reasons why the training does not go well, often these involve

- the discriminator starts performing too well after few iterations and the generator is not able to fool it anymore
- mode collapse: the generator generates only one sample

If some of this happens, one of the loss functions may reach a Inf value which then propagates in the computation through a series of operations and ends up providing a NaN in the variance of the BN layer, which then triggers the error you obtained.

RE I am therefor wondering if MATLAB may issue a warning or setting variance in the code of predict() function to be always positive, say, add an eps.

If you are referring to the variance computed in the BatchNormalization (BN) layer, it has an Epsilon and you are able to set it with the 'Epsilon' name value pair when instantiating the layer.

During training I would recommend to plot the lossGenerator and lossDiscriminator as functions of the iteration count. Even though being a simple metric, it may help you to track in advance when the training is not going in a good direction, or at least provide you with a reason why you obtained the error in the BN layer.

Notice also that a larger number of training iterations is not necesseraly better: as explained before during the optimization process the two optimization processes are going one against the other (rather than having just one network trying to improve its own loss as in more usual deep learning cases). For this reason I would also recommend to save your weights every, say, 500 iterations and at the end of training visually compare the results obtained with the different models.

On top of this, there are a series of strategies (some of them purely heuristic) that you can use to improve GAN training:

- increasing the 'Epsilon' of the BN layer empirically does help to smooth the training process, you can try to increase it from the default 1.e-5 to, e.g., 5.e-5 or 1.e-4

- Feature Matching and Minibatch Discrimination, specifically see Improved Techniques for Training GANs, 2016

- Add Gaussian noise to the discriminator input (both real and generated images)

- Label smoothing: as already pointed out, smoothing the labels by multiplying by a 0.9 in the computation of the discrimnator loss for real images

- Two Time-Scale Update Rule (TTUR) which simply consists in using different learning rates for generator and discriminator (being the generator learning rate smaller)

- Use Spectral normalization (https://arxiv.org/abs/1802.05957) (instead of BN), as it appears to perform very well as normalization layer. Currently this must be implemented as a Custom layer in MATLAB.

Other resources :

I hope this helps to shed some light !

Cheers,

Niccolo'

Delprat Sebastien
on 31 Dec 2019

I also think that the iteration number should only be increased if the iteration takes places (obvious). This may be usefull for the Adams Optimizer that uses the iteration number.

In the original code:

iteration = iteration + 1;

...

if size(data,1) < miniBatchSize

continue

end

Instead we should have

...

if size(data,1) < miniBatchSize

continue

end

iteration = iteration + 1;

Delprat Sebastien
on 31 Dec 2019

A few other tips for improving performances.

1) One-sidded label smoothin

In order to improve the stability, it is also advised to use a one-sidded label smoothing (https://arxiv.org/pdf/1606.03498.pdf)

So the funct

function [lossGenerator, lossDiscriminator] = ganLoss(dlYPred,dlYPredGenerated)

% Calculate losses for the discriminator network.

lossGenerated = -mean(log(1-sigmoid(dlYPredGenerated)));

% Calculate losses for the discriminator network.

% ==================================================

% Mind the *0.9 here

% ==================================================

lossReal = -mean(log(0.9*sigmoid(dlYPred)));

% Combine the losses for the discriminator network.

lossDiscriminator = lossReal + lossGenerated;

% Calculate the loss for the generator network.

lossGenerator = -mean(log(sigmoid(dlYPredGenerated)));

%lossGenerator = mean(max(dlYPredGenerated,0)-dlYPred+log(1+exp(-abs(dlYPredGenerated))));

end

2) Dropout layers

I did notice that in some implementation drop out layer were used in the discriminator

3) Batch normalisation

In order to prevent numerical issue, I did modify the Epsilon values to 5e-5 within each batch normalization.

As a ressult, here is my code

clear all

close all

clc

% Sligthly modified GAN code, provided as it

% Find a unique name for backup

NoFile=1;

while isfolder(sprintf('GAN%03i',NoFile))

NoFile=NoFile+1;

end

FolderName=sprintf('GAN%03i',NoFile);

% Create unique folder that will contains networks during training + video file of the progress

mkdir(FolderName);

FileVideo=[FolderName filesep 'video' sprintf('%03i',NoFile) '.avi'];

FileData=[FolderName filesep 'data' sprintf('%03i',NoFile) ];

% Video writer for saving GAN progress

v=VideoWriter(FileVideo);

open(v);

% Folder with real images

datasetFolder= ; % ADD HERE THE FOLDER WITH YOU PICTURES

% Datastore

imds = imageDatastore(datasetFolder, ...

'IncludeSubfolders',true, ...

'LabelSource','foldernames');

% Augment the data to include random horizontal flipping, scaling, and resize

% the images to have size 64-by-64.

augmenter = imageDataAugmenter( ...

'RandXReflection',true, ...

'RandScale',[1 2]);

nx=64;

ny=64;

%% skip data augmentation since we already have a lot of images, otherwise, use the data augmentation

augimds = augmentedImageDatastore([ny ny],imds);%,'DataAugmentation',augmenter);

%% Define Generator Network

filterSize = [4 4];

numFilters = 16;

numLatentInputs = 15;

windowChannelSize=10;

layersGenerator = [

imageInputLayer([1 1 numLatentInputs],'Normalization','none','Name','in')

transposedConv2dLayer(filterSize,8*numFilters,'Name','tconv1')

batchNormalizationLayer('Name','bn1','Epsilon',5e-5)

reluLayer('Name','relu1')

transposedConv2dLayer(filterSize,4*numFilters,'Stride',2,'Cropping',1,'Name','tconv2')

batchNormalizationLayer('Name','bn2','Epsilon',5e-5)

reluLayer('Name','relu2')

transposedConv2dLayer(filterSize,2*numFilters,'Stride',2,'Cropping',1,'Name','tconv3')

batchNormalizationLayer('Name','bn3','Epsilon',5e-5)

reluLayer('Name','relu3')

transposedConv2dLayer(filterSize,numFilters,'Stride',2,'Cropping',1,'Name','tconv4')

batchNormalizationLayer('Name','bn4','Epsilon',5e-5)

reluLayer('Name','relu4')

transposedConv2dLayer(filterSize,3,'Stride',2,'Cropping',1,'Name','tconv5')

tanhLayer('Name','tanh')];

lgraphGenerator = layerGraph(layersGenerator);

%%

% To train the network with a custom training loop and enable automatic differentiation,

% convert the layer graph to a |dlnetwork| object.

dlnetGenerator = dlnetwork(lgraphGenerator)

%% Define Discriminator Network

scale = 0.2;

layersDiscriminator = [

imageInputLayer([64 64 3],'Normalization','none','Name','in')

convolution2dLayer(filterSize,numFilters,'Stride',2,'Padding',1,'Name','conv1')

leakyReluLayer(scale,'Name','lrelu1')

dropoutLayer(0.25,'Name','drop1')

convolution2dLayer(filterSize,2*numFilters,'Stride',2,'Padding',1,'Name','conv2')

batchNormalizationLayer('Name','bn2','Epsilon',5e-5)

leakyReluLayer(scale,'Name','lrelu2')

dropoutLayer(0.25,'Name','drop2')

convolution2dLayer(filterSize,4*numFilters,'Stride',2,'Padding',1,'Name','conv3')

batchNormalizationLayer('Name','bn3','Epsilon',5e-5)

leakyReluLayer(scale,'Name','lrelu3')

dropoutLayer(0.25,'Name','drop3')

convolution2dLayer(filterSize,8*numFilters,'Stride',2,'Padding',1,'Name','conv4')

batchNormalizationLayer('Name','bn4','Epsilon',5e-5)

leakyReluLayer(scale,'Name','lrelu4')

dropoutLayer(0.25,'Name','drop4')

convolution2dLayer(filterSize,1,'Name','conv5')];

lgraphDiscriminator = layerGraph(layersDiscriminator);

dlnetDiscriminator = dlnetwork(lgraphDiscriminator)

%% Specify Training Options

% Train with a minibatch size of 128 for 1000 epochs. For larger datasets, you

% might not need to train for as many epochs. Set the read size of the augmented

% image datastore to the mini-batch size.

numEpochs = 1500; % Larger = more risk of mode collapse

miniBatchSize = 512; % Smaller batch size = more instablility (likely to ends with a mode collapse)

augimds.MiniBatchSize = miniBatchSize;

%% Learning rate (MUST NOT BE EQUAL)

learnRateGenerator = 0.0002;

learnRateDiscriminator = 0.0001;

trailingAvgGenerator = [];

trailingAvgSqGenerator = [];

trailingAvgDiscriminator = [];

trailingAvgSqDiscriminator = [];

gradientDecayFactor = 0.5;

squaredGradientDecayFactor = 0.999;

executionEnvironment = "auto";

%% Data for progress monitoring

NbImgValidation=16;

ZValidation = randn(1,1,numLatentInputs,NbImgValidation,'single');

dlZValidation = dlarray(ZValidation,'SSCB');

if (executionEnvironment == "auto" && canUseGPU) || executionEnvironment == "gpu"

dlZValidation = gpuArray(dlZValidation);

end

%% Train the GAN. This can take some time to run.

figure

iteration = 0;

start = tic;

ArraylossGenerator=NaN(1,1000); % Store loss for progress monitoring

ArraylossDiscriminator=NaN(1,1000);

% Loop over epochs.

for noEpoch = 1:numEpochs

if mod(noEpoch,10)==0 && noEpoch>1

% Save every 10 epoch

save([FileData '_' sprintf('%03i',noEpoch)],'numLatentInputs','dlnetGenerator','dlnetDiscriminator');

end

% Reset and shuffle datastore.

reset(augimds);

augimds = shuffle(augimds);

% Loop over mini-batches.

while hasdata(augimds)

fprintf('Epoch : %i Iter : %i mod : %i\n',noEpoch,iteration,mod(iteration,10));

% Read mini-batch of data.

data = read(augimds);

% Ignore last partial mini-batch of epoch.

if size(data,1) < miniBatchSize

continue

end

% Increase iteration variable only if iteratio takes place

iteration = iteration + 1;

% Concatenate mini-batch of data and generate latent inputs for the

% generator network.

X = cat(4,data{:,1}{:});

Z = randn(1,1,numLatentInputs,size(X,4),'single');

% Normalize the images

X = (single(X)/255)*2 - 1;

% Convert mini-batch of data to dlarray specify the dimension labels

% 'SSCB' (spatial, spatial, channel, batch).

dlX = dlarray(X, 'SSCB');

dlZ = dlarray(Z, 'SSCB');

% If training on a GPU, then convert data to gpuArray.

if (executionEnvironment == "auto" && canUseGPU) || executionEnvironment == "gpu"

dlX = gpuArray(dlX);

dlZ = gpuArray(dlZ);

end

% Evaluate the model gradients and the generator state using

% dlfeval and the modelGradients function listed at the end of the

% example.

[gradientsGenerator, gradientsDiscriminator, stateGenerator,lossGenerator, lossDiscriminator] = ...

dlfeval(@modelGradients, dlnetGenerator, dlnetDiscriminator, dlX, dlZ);

dlnetGenerator.State = stateGenerator;

% Store losses for future usage

if length(ArraylossGenerator)<iteration

ArraylossGenerator=[ArraylossGenerator NaN(1,1000)];

ArraylossDiscriminator=[ArraylossDiscriminator NaN(1,1000)];

end

ArraylossGenerator(iteration)=gather(extractdata(lossGenerator));

ArraylossDiscriminator(iteration)=gather(extractdata(lossDiscriminator));

% Update the discriminator network parameters.

[dlnetDiscriminator.Learnables,trailingAvgDiscriminator,trailingAvgSqDiscriminator] = ...

adamupdate(dlnetDiscriminator.Learnables, gradientsDiscriminator, ...

trailingAvgDiscriminator, trailingAvgSqDiscriminator, iteration, ...

learnRateDiscriminator, gradientDecayFactor, squaredGradientDecayFactor);

% Update the generator network parameters.

[dlnetGenerator.Learnables,trailingAvgGenerator,trailingAvgSqGenerator] = ...

adamupdate(dlnetGenerator.Learnables, gradientsGenerator, ...

trailingAvgGenerator, trailingAvgSqGenerator, iteration, ...

learnRateGenerator, gradientDecayFactor, squaredGradientDecayFactor);

% Every 100 iterations, display batch of generated images using the

% held-out generator input.

if mod(iteration,10) == 0 || iteration == 1

% Generate images using the held-out generator input.

dlXGeneratedValidation = predict(dlnetGenerator,dlZValidation);

% Rescale the images in the range [0 1] and display the images.

subplot(1,2,2);

I = imtile(extractdata(dlXGeneratedValidation));

I = rescale(I);

image(I)

% Update the title with training progress information.

D = duration(0,0,toc(start),'Format','hh:mm:ss');

title(...

"Epoch: " + noEpoch + ", " + ...

"Iteration: " + iteration + ", " + ...

"Elapsed: " + string(D))

if iteration==1

subplot(2,2,1);

hPlot1=plot(ArraylossGenerator);

xlabel('iteration');

title('loss generator');

subplot(2,2,3);

hPlot2=plot(ArraylossDiscriminator);

xlabel('iteration');

title('loss discriminator');

set(gcf,'position',[ 89.4000 245.4000 958.6000 516.6000]);

else

hPlot1.YData=ArraylossGenerator;

hPlot2.YData=ArraylossDiscriminator;

end

frame = getframe(gcf);

writeVideo(v,frame);

drawnow

end

end

end

close(v);

% Save all data

save(FileData);

%%

% Here, the discriminator has learned a strong feature representation that identifies

% real images among generated images and in turn, the generator has learned a

% similarly strong feature representation that allows it to generate realistic

% looking data.

%% Generate New Images

% To generate new images, use the |predict| function on the generator with a

% |dlarray| object containing a batch of 1-by-1-by-100 arrays of random values.

% To display the images together, use the |imtile| function and rescale the images

% using the |rescale| function.

%

% Create a |dlarray| object containing a batch of 16 1-by-1-by-100 arrays of

% random values to input into the generator network.

ZNew = randn(1,1,numLatentInputs,16,'single');

dlZNew = dlarray(ZNew,'SSCB');

%%

% For GPU inference, convert the data to |gpuArray| objects.

if (executionEnvironment == "auto" && canUseGPU) || executionEnvironment == "gpu"

dlZNew = gpuArray(dlZNew);

end

%%

% Generate new images using the |predict| function with the generator and the

% input data.

dlXGeneratedNew = predict(dlnetGenerator,dlZNew);

%%

% Display the images.

figure

I = imtile(extractdata(dlXGeneratedNew));

I = rescale(I);

image(I)

title("Generated Images")

%% Model Gradients Function

% The function |modelGradients| takes generator and discriminator |dlnetwork|

% objects |dlnetGenerator| and |dlnetDiscrimintor|, a mini-batch of input data

% |X|, and an array of random values |Z|, and returns the gradients of the loss

% with respect to the learnable parameters in the networks and an array of generated

% images.

function [gradientsGenerator, gradientsDiscriminator, stateGenerator,lossGenerator, lossDiscriminator] = ...

modelGradients(dlnetGenerator, dlnetDiscriminator, dlX, dlZ)

% Calculate the predictions for real data with the discriminator network.

dlYPred = forward(dlnetDiscriminator, dlX);

% Calculate the predictions for generated data with the discriminator network.

[dlXGenerated,stateGenerator] = forward(dlnetGenerator,dlZ);

dlYPredGenerated = forward(dlnetDiscriminator, dlXGenerated);

% Calculate the GAN loss

[lossGenerator, lossDiscriminator] = ganLoss(dlYPred,dlYPredGenerated);

% For each network, calculate the gradients with respect to the loss.

gradientsGenerator = dlgradient(lossGenerator, dlnetGenerator.Learnables,'RetainData',true);

gradientsDiscriminator = dlgradient(lossDiscriminator, dlnetDiscriminator.Learnables);

end

%% GAN Loss Function

% The objective of the generator is to generate data that the discriminator

% classifies as "real". To maximize the probability that images from the generator

% are classified as real by the discriminator, minimize the negative log likelihood

% function. The loss function for the generator is given by

%

% $$\textrm{lossGenerator}=-\textrm{mean}\left(\log \left(\sigma \left({\hat{Y}

% }_{\textrm{Generated}} \right)\right)\right),$$

%

% where $\sigma$ denotes the sigmoid function, and $\hat{Y}_{Generated}$ denotes

% the output of the discriminator with generated data input.

%

% The objective of the discriminator is to not be "fooled" by the generator.

% To maximize the probability that the discriminator successfully discriminates

% between the real and generated images, minimize the sum of the corresponding

% negative log likelihood functions. The output of the discriminator corresponds

% to the probabilities the input belongs to the "real" class. For the generated

% data, to use the probabilities corresponding to the "generated" class, use the

% values $1-\sigma(\hat{Y}_{Generated})$. The loss function for the discriminator

% is given by

%

% $$\textrm{lossDiscriminator}=-\textrm{mean}\left(\log \left(\sigma \left({\hat{Y}

% }_{\textrm{Real}} \right)\right)\right)-\textrm{mean}\left(\log \left(1-\sigma

% \left({\hat{Y} }_{\textrm{Generated}} \right)\right)\right),$$

%

% where $\hat{Y}_{Real}$ denotes the output of the discriminator with real data

% input.

function [lossGenerator, lossDiscriminator] = ganLoss(dlYPred,dlYPredGenerated)

% Calculate losses for the discriminator network.

lossGenerated = -mean(log(1-sigmoid(dlYPredGenerated)));

% Calculate losses for the discriminator network.

% ==================================================

% Mind the *0.9 here

% ==================================================

lossReal = -mean(log(0.9*sigmoid(dlYPred)));

% Combine the losses for the discriminator network.

lossDiscriminator = lossReal + lossGenerated;

% Calculate the loss for the generator network.

lossGenerator = -mean(log(sigmoid(dlYPredGenerated)));

%lossGenerator = mean(max(dlYPredGenerated,0)-dlYPred+log(1+exp(-abs(dlYPredGenerated))));

end

%% References

%%

% # The TensorFlow Team. _Flowers_ <http://download.tensorflow.org/example_images/flower_photos.tgz

% http://download.tensorflow.org/example_images/flower_photos.tgz>

%%

% _Copyright 2019 The MathWorks, Inc._

Steven Lord
on 1 Jan 2020

wenyi shao
on 13 Oct 2020

I met the same problem "varience expected to be positive" when using 2019b last year, and using 2020b this year.

I contacted the technical services as well but they couldn't provide effective answer. So far, a quick way I have tried is to test with different mini-batch size. Then this problem disappears, but I don't know the underlied reason.

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!