why knn classifier accuracy for multi class dataset is low?

I have data set consist of 296 features for 37 class. The data set is ordered according to the classes. I trained and test the data using KNN classifier. However, the maximum accuracy that i have go is about 13.423%.
  • why I got these low result I tried it on 296*21 features and 296*82 the result does not change much?
  • Is this because I have 37 class, because most of the application i saw have about 3 classes?
  • Is there any way to increase this accuracy or alternative evaluation way because the knn match result is success 100% ?
thank you

1 Comment

I used classperf the see the performance of the classifier and here is the last result I have got
Label: ''
Description: ''
ClassLabels: [37x1 double]
GroundTruth: [296x1 double]
NumberOfObservations: 296
ControlClasses: [36x1 double]
TargetClasses: 1
ValidationCounter: 1
SampleDistribution: [296x1 double]
ErrorDistribution: [296x1 double]
SampleDistributionByClass: [37x1 double]
ErrorDistributionByClass: [37x1 double]
CountingMatrix: [38x37 double]
CorrectRate: 0.0743
ErrorRate: 0.9257
LastCorrectRate: 0.0743
LastErrorRate: 0.9257
InconclusiveRate: 0
ClassifiedRate: 1
Sensitivity: 0.2500
Specificity: 0.9792
PositivePredictiveValue: 0.2500
NegativePredictiveValue: 0.9792
PositiveLikelihood: 12
NegativeLikelihood: 0.7660
Prevalence: 0.0270
DiagnosticTable: [2x2 double]

Sign in to comment.

 Accepted Answer

Your classes, using the features/measurements you chose to use, are badly overlapped. If you were able to plot each data point in 296-space, you'd see that there is a lot of mixing of where the classes occur.
For example, in 1-space if you had 5 classes
class1 = rand(1,100); % Ranges from 0 to 1.0
class2 = 1.05 * rand(1,100); % Ranges from 0 to 1.05
class3 = 1.10 * rand(1,100); % Ranges from 0 to 1.10
class4 = 1.15 * rand(1,100); % Ranges from 0 to 1.15
class5 = 1.20 * rand(1,100); % Ranges from 0 to 1.20
If you plotted each class in a different color, you'd see that there is a tremendous amount of overlap in the 0 to 1 region because all classes have a value in that region.

4 Comments

Amazing answer although I am frustrated because I was working hard for better accuracy. I have to find another evaluation method or change the classifier. The classes and features I can not change the because I am working in recognition system and the main goal of the system is to find the correct class of the desired feature. thank you MR.Image Analyst for you great answer and the attachment was very helpful for me to understand even the classification concept.
You might try deep learning, though I'd have my doubts even with that. It seems like you just chose the wrong (non-descriptive) measurements to use. Put effort into making sure the measurements you use are truly discriminative.
You might also try treebagger/random fortest.
There is someone who used Neural Network as classifier (I was thinking to try it also but I face some difficulties in understanding it). The recognition rate was 100%.The testing was on 50 features and 50 labels but there is no accuracy specified. This means I may not be able to find the accuracy as what happened with KNN.
With any method, your training set is assumed to be 100% accurate. Then you put through your test set. The only way to see if the test set was accurately predicted is to know the ground truth for that test set, otherwise all you have are predictions. So, with either NN or KNN if you got accuracy, you must have had ground truth (the known, correct classification).

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!