Increasing the number of hidden layers in a function fitting neural network seems to improve its performance (apparently without overfitting)

54 views (last 30 days)
I am trying to solve a kinematic/dynamic mathematical problem, of two moving objects with the supervised function fitting neural network fitnet.
The network takes 5 INPUTS and gives 1 OUTPUT.
The initial step for me was to define the number of hidden layers and neutrons, so I did some research on papers, who tried to solve the same problem via a function fitting neural network and was surprised, that they had no answer on how to define the number of layers and neurons/layer. Everyone stated that they used "Informal Testing" and "Try&Error-Method" and expirmented with the number of layers until they found the results good enough.
This made me curious, so I tried to do at least some kind of "analysis" to this problem.
My strategie was to test the Number_of_Layers form 1 to 4 and the Number_of_Neurones from 1 to 20. But that means that there is 20^4 = 168.420 different ways the network layer/neurons-architecture could look.
So I tested all the 168.420 function-fitting networks and changed the Number of Neurones/Layers for each test and saved the RMSE from the test set.
Most important properties of the network:
adaptFcn: 'adaptwb'
adaptParam: (none)
derivFcn: 'defaultderiv'
divideFcn: 'dividerand'
divideParam: .trainRatio, .valRatio, .testRatio
divideMode: 'sample'
initFcn: 'initlay'
performFcn: 'mse'
performParam: .regularization, .normalization
plotFcns: {'plotperform', plottrainstate, ploterrhist,
plotregression, plotfit}
plotParams: {1x5 cell array of 5 params}
trainFcn: 'trainlm'
%% For each individual Layer:
initFcn: 'initnw'
netInputFcn: 'netsum'
transferFcn: 'tansig'
I trained the network with 8322 datasamples which were divied as following:
net.trainRatio = 70/100;
net.valRatio = 25/100;
net.testRatio = 5/100;
My first guess was, that the network performance would decrease with increasing number of layers and neurons/layer, but the opposite was the case. The more total Neurones the better was the Network.
The following plot shows the test-set-RMSE (solid line) and the validation-set-RMSE (dotted line) for examples with networks with evenly distributed neurons
(e.g. : 1
The most interesting part is, that the Network with the least test_RMSE had 4 Layers with [8 17 15 3] neurons.
Then I took some other examples with >10 neurons in the first layer, <15 neurons in Layers two and three and <5 neurons in Layer four.
They all showd significanlty better results than networks with evenly distributed Neurons per layer.
I could not find an explanation for this phenomenon (few neurons in 1st and last layer, many neurons in the inbetween layers) yet.
I am now very curious if anyone has an explanation or at least experienced the same phenomenon. Thanks in advance!

Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!