Why Matlab produces different results with KNN classifier for same dataset?

7 views (last 30 days)
Hi everyone, I use Matlab to code some metaheuristic algorithms for feature selection. Before Matlab, I mostly used Weka.
I selected all features and sent to the KNN classifier to compare with the Weka to satisfy my curiosity. Lets say we have 50 features in the dataset and all of them are selected. I run the code 10 times and here is what KNN produces in Matlab as Error Rate with K=3 and 10-fold.
1) 0.24765
2) 0.22571
3) 0.23197
4) 0.26019
5) 0.23511
6) 0.23511
7) 0.25078
8) 0.23197
9) 0.23511
10) 0.24138
Here is what Weka produces for every run: 0.26333
Why Matlab produces different results? Why it is different than Weka? I used same dataset with same features and the same parameters (K=3 and 10-fold). I am confused. Here is my code snippet that Error Rate from KNN is generated:
function [errorrate]=jFitnessFunction(feat,label,X)
% feat: features
% label: labels
% X: selected features' indexes (binary vector)
% Parameter setting for k-value of KNN
k=3;
% Parameter setting for number of cross-validation
kfold=10;
[errorrate]=jwrapperKNN(feat(:,X==1),label,k,kfold);
end
% Perform KNN with k-folds cross-validation
function [ER]=jwrapperKNN(feat,label,k,kfold)
Model=fitcknn(feat,label,'NumNeighbors',k,'Distance','euclidean');
C=crossval(Model,'KFold',kfold);
% Error rate
ER=kfoldLoss(C);
end

Answers (1)

Walter Roberson
Walter Roberson on 7 Dec 2020
Kfold validation is random.
fitcknn is not doing a one-time knn examination to find the k nearest neighbours of the given data. Instead it is building a classifier using K centroids, and it is trying to avoid overtraining so it takes random subsets and reports back the centroids that gave the best classification.

Categories

Find more on Statistics and Machine Learning Toolbox in Help Center and File Exchange

Products


Release

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!