The traning is 100% but mistake in testing

2 views (last 30 days)
Hi everybody...i traning my dataset ...i used surf algorithm to extract features and i selected 10 features for everyclass..i have 5 classes.. the traning reaches to 100% but when enter image to test the result is not correct for example when enter image of first class it classefy as second or thired class...why this happen ...olz help me

Accepted Answer

Greg Heath
Greg Heath on 8 Jul 2014
Sounds like a classic case of overtraining an overfit net.
You have too many unknown weights and/or not enough training equations.
Cures
1. To eliminate overfitting, minimize the number of hidden nodes that will yield satisfactory performance
2. Use a validation set to prevent overtraining an overfit net
3. Use regularization via the error function MSEREG and/or the training function TRAINBR
Let us know which one you use.
Thank you for formally accepting y answer
Greg
  3 Comments
Soniya Rudani
Soniya Rudani on 24 Jul 2014
Edited: Soniya Rudani on 24 Jul 2014
Hi Mr Greg,
I am using Neural Network from Matlab toolbox,My input to the classifier are dtmf digits pitch and intensities. The desired output digits should be any three digits of my inputs according to which rows of target should also be three digits and number of columns remain the same for input and outputs,Can you tell me please how will I know the output digits?
Moreover, is there any way that i could per testing on neural network just like we do on naivebayes classifier by giving test file, train file and target file?
Hope you will come with a clue to this. Thanks!
Soniya.
Greg Heath
Greg Heath on 24 Jul 2014
>thanks greg... i used msereg and trainbr.. but the training dose not reach >to 100%... what also can do it???plz help
That may be OK. It's the performance on nontraining data that counts.
I don't use trainbr and msereg so I don't know what practical goals should be.
For other nets I use mse and try to obtain an R^2 > 0.99 for nontraining data.
Search coefficient-of-determination in Wikipedia for a more complete discussion of R^2. It can be interpreted as the per cent of target variance that is modeled by the net.
Since the model typically yields errors that have zero mean, the mse is just the biased (i.e., divide by N instead of N-1) variance of the error e=t-y. The normalized mse is the ratio of that to the average variance of the target, t.
NMSE = mse(e)/mean( var(t',1))
Then
R^2 = 1-NMSE.
The so-called overfitting dilemma is really overtraining an overfit net to nearly zero error on training data which can result in unacceptable performance on nontraining data.
[ I N ] = size(input) = ?
[ O N ] = size(target) = ?
datadivisionratios trn/val/tst = ?
Number of training equations Ntrneq = Ntrn*O=?
Number of hidden nodes H = ?
Number of unknown weights Nw = (I+1)*H+(+1)*O = ?
Nw > Ntrneq is overfitting which can be mitigated by increasing Ntrn, decreasing H, using validation convergence stopping (to prevent overtraining) or regularization.
The important performance measures are NMSE or R^2 for validation and test data. If I recall correctly, trainbr does not allow a validation set. However, you can use msereg with trainlm (regression) and trainscg (classification)
Hope this helps.
Greg

Sign in to comment.

More Answers (0)

Categories

Find more on Sequence and Numeric Feature Data Workflows in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!