> Can we conclude that the larger network learnt too much from the examples given during the training, thus loosing the capability to generalize on the basis of new examples (overfitting)? but the small networks performed better because they had the less training records?
ABSOLUTELY NOT!
Results heavily depend on how the data is divided. For example: randomly vs by sections.
You apparently misunderstand the concepts of overfitting and overtraining:
OVERFITTING: There are more unknown weights than training equations. This allows an infinite number of minima for training data ( How many solutions {x1,x2} are there for the problem x1+x2 = 1 ?!) which are not minima for nontraining (i.e., validation and testing) data.
OVERTRAINING: Training an overfit network beyond the point where performance on NONTRAINING data begins to deteriorate.
As long as all data is representative of the general I/O mapping, the more data, the better. That is why random datadivision is the default in MATLAB NN training programs.
Hope this helps.
Thank you for formally accepting my answer
Greg
0 Comments
Sign in to comment.