# Increasing the number of hidden layers in a function fitting neural network seems to improve its performance (apparently without overfitting)

54 views (last 30 days)
Ludwig Amin on 7 Sep 2021
Edited: Ludwig Amin on 7 Sep 2021
Hello,
I am trying to solve a kinematic/dynamic mathematical problem, of two moving objects with the supervised function fitting neural network fitnet.
The network takes 5 INPUTS and gives 1 OUTPUT.
The initial step for me was to define the number of hidden layers and neutrons, so I did some research on papers, who tried to solve the same problem via a function fitting neural network and was surprised, that they had no answer on how to define the number of layers and neurons/layer. Everyone stated that they used "Informal Testing" and "Try&Error-Method" and expirmented with the number of layers until they found the results good enough.
This made me curious, so I tried to do at least some kind of "analysis" to this problem.
My strategie was to test the Number_of_Layers form 1 to 4 and the Number_of_Neurones from 1 to 20. But that means that there is 20^4 = 168.420 different ways the network layer/neurons-architecture could look.
So I tested all the 168.420 function-fitting networks and changed the Number of Neurones/Layers for each test and saved the RMSE from the test set.
Most important properties of the network:
fitnet:
derivFcn: 'defaultderiv'
divideFcn: 'dividerand'
divideParam: .trainRatio, .valRatio, .testRatio
divideMode: 'sample'
initFcn: 'initlay'
performFcn: 'mse'
performParam: .regularization, .normalization
plotFcns: {'plotperform', plottrainstate, ploterrhist,
plotregression, plotfit}
plotParams: {1x5 cell array of 5 params}
trainFcn: 'trainlm'
%% For each individual Layer:
initFcn: 'initnw'
netInputFcn: 'netsum'
transferFcn: 'tansig'
I trained the network with 8322 datasamples which were divied as following:
net.trainRatio = 70/100;
net.valRatio = 25/100;
net.testRatio = 5/100;
My first guess was, that the network performance would decrease with increasing number of layers and neurons/layer, but the opposite was the case. The more total Neurones the better was the Network.
The following plot shows the test-set-RMSE (solid line) and the validation-set-RMSE (dotted line) for examples with networks with evenly distributed neurons
(e.g. : 1
3-3
2-2-2
16-16-16-16
etc...)
The most interesting part is, that the Network with the least test_RMSE had 4 Layers with [8 17 15 3] neurons.
Then I took some other examples with >10 neurons in the first layer, <15 neurons in Layers two and three and <5 neurons in Layer four.
They all showd significanlty better results than networks with evenly distributed Neurons per layer.
I could not find an explanation for this phenomenon (few neurons in 1st and last layer, many neurons in the inbetween layers) yet.
I am now very curious if anyone has an explanation or at least experienced the same phenomenon. Thanks in advance!

### Categories

Find more on Modeling and Prediction with NARX and Time-Delay Networks in Help Center and File Exchange

R2021a

### Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!