SVM Model not working properly on test data

Hello all, I'm using fitcsvm to classify two sets of data, Class 1 and Class 2. When I train the model on a dataset and then check how it did using predict, it seems to be working perfectly. However, when I input a new dataset, it only ever classifies it in Class 2, even though I use the cost modifier to state that classifying Class 1 in Class 2 is three times worse than misclassifying Class 2 as a Class 1.
SVM Model:
SVMModel = fitcsvm(data, classLabel, 'KernelFunction', 'gaussian',...
'Standardize',true, 'ClassNames', {'Class 1','Class 2'},...
'Cost',[0 3; 1 0]);
Verification of Training set:
[svmLabel, score] = predict(SVMModel, data);
Test Set:
[newsvmLabel, score] = predict(SVMModel, testData);
I think I'm using it correctly, so any advice on what's going on would be much appreciated.

1 Comment

Can you upload the data, or a small sample that exhibits the problem? That way we can run your code and see for ourselves.

Sign in to comment.

Answers (3)

You are applying the predict function correctly, so there must be something wrong with your testData, probably the type in one column doesn't match the training data. Without providing an example - as "the cyclist" request - impossible to help you further.
did you solve the issue? I have a similar problem.

1 Comment

Without the data we are just guessing at what the problem may be. Actually you may need to look at your training data as well, maybe that's highly imbalanced and the cost function didn't work.

Sign in to comment.

I have attached the train data set and test here. The first column in the train dataset is the labels and the others are extracted features. I have applied Linear SVM classifier and I got 92.5 % accuracy on train data. But when I am applying on the test data the result is not good at all. I would appreciate if you let me know what is wrong with data.

4 Comments

Thank you. Your training data set is actually a struct that contains all sort of stuff, including something that sounds like your training and test sets, and a classification tree, but it's a little bit jumbled up. I infer your dataset has 6612 observations and 11 features, maybe there's another 161 observations (which are currently labelled as the train data set, while the former larger set is labelled test data set, so that's messed up). Maybe you started with 44 features and condensed them down to 11, not sure, your struct is confusing. Also some of your elements are row major, others are column major, so that's definitely going to confuse the fit and/or predict functions.
What I did to see whether you can train a reasonable model with this data:
1. Take your matrix of 6612 observations and 11 features
2. Add the labels to that so you have a table with the features first, and the labels in the last column (that's what most MATLAB functions expect)
3. Load that into Classification Learner and train a model. A classification tree gets a nice 97% validation accuracy, while the SVM only gets 80%. Didn't dig further into how to improve that.
4. To double verify that the model can predict reasonably on held out test data, I held out 632 observations from that set as independent test set. Reloaded the 5980 remaining observations into Classification Learner, trained a tree, exported that tree, and then used predict to generate predictions on the held out test data. That yielded 84% accuracy, and it did predict all of the 6 labels. So that all makes sense.
Hi @Bernhard,
I have trained a dataset using SVM in the Classification Learner App. I want to test the model on my test set. Please find my code below:
function [trainedClassifier, validationAccuracy] = trainClassifier(trainingData)
% [trainedClassifier, validationAccuracy] = trainClassifier(trainingData)
% returns a trained classifier and its accuracy. This code recreates the
% classification model trained in Classification Learner app. Use the
% generated code to automate training the same model with new data, or to
% learn how to programmatically train models.
%
% Input:
% trainingData: a table containing the same predictor and response
% columns as imported into the app.
%
% Output:
% trainedClassifier: a struct containing the trained classifier. The
% struct contains various fields with information about the trained
% classifier.
%
% trainedClassifier.predictFcn: a function to make predictions on new
% data.
%
% validationAccuracy: a double containing the accuracy in percent. In
% the app, the History list displays this overall accuracy score for
% each model.
%
% Use the code to train the model with new data. To retrain your
% classifier, call the function from the command line with your original
% data or new data as the input argument trainingData.
%
% For example, to retrain a classifier trained with the original data set
% T, enter:
% [trainedClassifier, validationAccuracy] = trainClassifier(T)
%
% To make predictions with the returned 'trainedClassifier' on new data T2,
% use
% yfit = trainedClassifier.predictFcn(T2)
%
% T2 must be a table containing at least the same predictor columns as used
% during training. For details, enter:
% trainedClassifier.HowToPredict
% Auto-generated by MATLAB on 17-Jun-2020 20:37:59
% Extract predictors and response
% This code processes the data into the right shape for training the
% model.
inputTable = trainingData;
predictorNames = {'autoc', 'contr', 'corrm', 'corrp', 'cprom', 'cshad', 'dissi', 'energ', 'entro', 'homom', 'homop', 'maxpr', 'sosvh', 'savgh', 'svarh', 'senth', 'dvarh', 'denth', 'inf1h', 'inf2h', 'indnc', 'idmnc'};
predictors = inputTable(:, predictorNames);
response = inputTable.ClassLabel;
isCategoricalPredictor = [false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false];
% Train a classifier
% This code specifies all the classifier options and trains the classifier.
classificationSVM = fitcsvm(...
predictors, ...
response, ...
'KernelFunction', 'linear', ...
'PolynomialOrder', [], ...
'KernelScale', 'auto', ...
'BoxConstraint', 1, ...
'Standardize', true, ...
'ClassNames', [1; 2]);
% Create the result struct with predict function
predictorExtractionFcn = @(t) t(:, predictorNames);
svmPredictFcn = @(x) predict(classificationSVM, x);
trainedClassifier.predictFcn = @(x) svmPredictFcn(predictorExtractionFcn(x));
% Add additional fields to the result struct
trainedClassifier.RequiredVariables = {'autoc', 'contr', 'corrm', 'corrp', 'cprom', 'cshad', 'denth', 'dissi', 'dvarh', 'energ', 'entro', 'homom', 'homop', 'idmnc', 'indnc', 'inf1h', 'inf2h', 'maxpr', 'savgh', 'senth', 'sosvh', 'svarh'};
trainedClassifier.ClassificationSVM = classificationSVM;
trainedClassifier.About = 'This struct is a trained model exported from Classification Learner R2019a.';
trainedClassifier.HowToPredict = sprintf('To make predictions on a new table, T, use: \n yfit = c.predictFcn(T) \nreplacing ''c'' with the name of the variable that is this struct, e.g. ''trainedModel''. \n \nThe table, T, must contain the variables returned by: \n c.RequiredVariables \nVariable formats (e.g. matrix/vector, datatype) must match the original training data. \nAdditional variables are ignored. \n \nFor more information, see <a href="matlab:helpview(fullfile(docroot, ''stats'', ''stats.map''), ''appclassification_exportmodeltoworkspace'')">How to predict using an exported model</a>.');
% Extract predictors and response
% This code processes the data into the right shape for training the
% model.
inputTable = trainingData;
predictorNames = {'autoc', 'contr', 'corrm', 'corrp', 'cprom', 'cshad', 'dissi', 'energ', 'entro', 'homom', 'homop', 'maxpr', 'sosvh', 'savgh', 'svarh', 'senth', 'dvarh', 'denth', 'inf1h', 'inf2h', 'indnc', 'idmnc'};
predictors = inputTable(:, predictorNames);
response = inputTable.ClassLabel;
isCategoricalPredictor = [false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false];
% Perform cross-validation
partitionedModel = crossval(trainedClassifier.ClassificationSVM, 'KFold', 5);
% Compute validation predictions
[validationPredictions, validationScores] = kfoldPredict(partitionedModel);
% Compute validation accuracy
validationAccuracy = 1 - kfoldLoss(partitionedModel, 'LossFun', 'ClassifError')
I saved the above code as trainClassifier.m I used the code below for testing
yFit = trainClassifier.predictFcn(statsArray1);
But I get the following error message.
Undefined variable "trainClassifier" or class "trainClassifier.predictFcn".
statsArray1 is the test data whose class is the one I want to predict. It is actually the set of features extracted from a test image and I want to predict as whether the image is Benign or Malignant. Any help would be appreciated. Thank you.
Did you solve the issue? I have a similar problem
Hi Salwa
Did you solve the issue? I have a similar problem

Sign in to comment.

Asked:

on 8 Jan 2018

Commented:

on 14 Dec 2020

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!