Hi,
The training is done on a threadripper 1950x with 64GB of RAM and a gtx1050TI 4GB (i use a minibatchsize of 4 in order for it to fit in the GPU memory). Matlab 2020a is used.
The dataset are images of size [284 481 3], with 10k images in the training dataset and ~3k images in the validation set. Both are stored in a pixelLabelImageDatastore object. When I train the network without validation, memory usage in task manager shows around 6GB used out of 64GB. So no problems there.
However, when I attempt to train the network with validation, memory usage skyrockets to 64GB, with approximately 30 minutes of swapping to disk everytime the validation step happens. After a couple epochs, MATLAB then throws the 'out of memory' error. I even increased the size of the swap file in Windows to 500GB with the same results. It only takes a couple of extra epochs before crashing.
What is the reason this is happing only with the validation data and what can be done to counteract this? I believed that using a datastore made it so that only the data required at a certain moment was read into memory instead of the entire batch? The total file size of all the labeled and input images in my dataset is only ~300MB.
Thanks for any feedback!
0 Comments
Sign in to comment.