Improving the results from SVM classification of faults

18 views (last 30 days)
I'm wondering if anyone may have any suggestions to improve the classifcation from a SVM. I'm currently following the example from the Mathworks help and ammending it to finding faults in my own generated data set.
My problem is, I'm getting poor results in the classification and I'm wondering if anyone may be able to suggest ways to improve this?
The output from my data set undergoes preprocessing as in the example, I've retimed it to the frequency the circuit operates at 20KHz = 5e-5 secs. The power spectrum and features are extracted as in the example. From the features a fault flag is produced using statistical measures and used to train the classifier (see code below).
At present I'm only trying to extract (alas not very sucessfully) three separate faults, eventually I would also like to classify multiple faults.
The best results I've achieved are 43.75% accuracy (see below).
I did think it may be to do with the feature extraction, as the power spectrum starts at -50dB so extracting changes in features is difficult.
My data set contains 80 runs (this takes ages to generate) - probably at bit small, would a larger set help?
I would be very grateful if anyone may be able to suggest ways to improve the classification (if possible).
Kind regards,
mdl = 'Boost_converter';
% Set fault parameters
% Default parameters for simulation
% Bondwire RC 1e-4
% ESR is 1e-6
% Thermal Resistance 5th element in Foster Model [9.5e-3 0.125039
% 0.132857 0.256654 "0.021551" 2.1e-3] - 0.021551 relates to solder between
% layers
%Bondwire RC failire at 3e-3
% Electrolytic Capacitor 220uF failure at ESR of 0.5 Ohms
% Foster Thermal Resistance paramters, failure at 0.2 increase of 5th
% element in Foster model
IGBT_RC=[mdl '/IGBT Full1']; 'RC';
% set number of simulation runs here default is 20, this gives a data set
% of 80.
nPerGroup = 20; % 125 Number of elements in each fault group
% No fault simulations
Bondwire = repmat(Bondwire_fault_set(1),nPerGroup,1);
ESR = repmat(ESR_fault_set(1),nPerGroup,1);
Thermal_Resistance = repmat(Thermal_Resistance_set(1),nPerGroup,1);
% Single fault simulations
idx = ceil(10*rand(nPerGroup,1));
Bondwire = [Bondwire; Bondwire_fault_set(idx)'];
ESR = [ESR;repmat(ESR_fault_set(1),nPerGroup,1)];
Thermal_Resistance= [Thermal_Resistance;repmat(Thermal_Resistance_set(1),nPerGroup,1)];
idx = ceil(10*rand(nPerGroup,1));
Bondwire = [Bondwire; repmat(Bondwire_fault_set(1),nPerGroup,1)];
ESR = [ESR;ESR_fault_set(idx)'];
Thermal_Resistance = [Thermal_Resistance;repmat(Thermal_Resistance_set(1),nPerGroup,1)];
idx = ceil(10*rand(nPerGroup,1));
Bondwire = [Bondwire; repmat(Bondwire_fault_set(1),nPerGroup,1)];
ESR = [ESR;repmat(ESR_fault_set(1),nPerGroup,1)];
Thermal_Resistance = [Thermal_Resistance;Thermal_Resistance_set(idx)'];
% %
% % %**************************************************************************
% % % Double fault simulations
% % idxA = ceil(10*rand(nPerGroup,1));
% % idxB = ceil(10*rand(nPerGroup,1));
% % Bondwire = [Bondwire; Bondwire_fault_set(idxA)'];
% % ESR = [ESR;ESR_fault_set(idxB)'];
% % Thermal_Resistance = [Thermal_Resistance;repmat(Thermal_Resistance_set(1),nPerGroup,1)];
% % idxA = ceil(10*rand(nPerGroup,1));
% % idxB = ceil(10*rand(nPerGroup,1));
% % Bondwire = [Bondwire; Bondwire_fault_set(idxA)'];
% % ESR = [ESR;repmat(ESR_fault_set(1),nPerGroup,1)];
% % Thermal_Resistance = [Thermal_Resistance;Thermal_Resistance_set(idxB)'];
% % idxA = ceil(10*rand(nPerGroup,1));
% % idxB = ceil(10*rand(nPerGroup,1));
% % Bondwire = [Bondwire; repmat(Bondwire_fault_set(1),nPerGroup,1)];
% % ESR = [ESR;ESR_fault_set(idxA)'];
% % Thermal_Resistance = [Thermal_Resistance;Thermal_Resistance_set(idxB)'];
% % %**************************************************************************
% % % Triple fault simulations
% % idxA = ceil(10*rand(nPerGroup,1));
% % idxB = ceil(10*rand(nPerGroup,1));
% % idxC = ceil(10*rand(nPerGroup,1));
% % Bondwire = [Bondwire; Bondwire_fault_set(idxA)'];
% % ESR = [ESR; ESR_fault_set(idxB)'];
% % Thermal_Resistance = [Thermal_Resistance; Thermal_Resistance_set(idxC)'];
% %**************************************************************************
%Use the fault parameter combinations to create Simulink.SimulationInput objects.
%For each simulation input ensure that the random seed is set differently to generate different results.
IGBT=[mdl '/IGBT Full1'];'RC';
CAP=[mdl '/Capacitor'];'r';
IGBT2=[mdl '/IGBT Full1'];'Rth_Foster(5)';
Noise_Source=[mdl '/Voltage Source'];'user_seed';
for ct = numel(Bondwire):-1:1
simInput(ct) = Simulink.SimulationInput(mdl);
simInput(ct) = setVariable(simInput(ct),'IGBT',Bondwire(ct));
simInput(ct) = setVariable(simInput(ct),'CAP',ESR(ct));
simInput(ct) = setVariable(simInput(ct),'IGBT2',Thermal_Resistance(ct));
simInput(ct) = setVariable(simInput(ct),'Noise_Source',ct-1);
%Use the generateSimulationEnsemble function to run the simulations defined by the Simulink.
%SimulationInput objects defined above and store the results in a local sub-folder.
%Then create a simulationEnsembleDatastore from the stored results.
% Run the simulation and create an ensemble to manage the simulation results
if ~exist(fullfile(pwd,'Data'),'dir')
mkdir(fullfile(pwd,'Data')); % Create directory to store results
% [ok,e] = generateSimulationEnsemble(simInput(1:10),fullfile(pwd,'Data'),'UseParallel',true);
% [ok,e] = generateSimulationEnsemble(simInput(1:80),fullfile('.','Data'),'UseParallel',true);
% [ok,e] = generateSimulationEnsemble(simInput,fullfile('.','Data'),'UseParallel',true);
% Use above for full data set
runAll = true;
if runAll
[ok,e] = generateSimulationEnsemble(simInput,fullfile('.','Data'),'UseParallel',true);
[ok,e] = generateSimulationEnsemble(simInput(1:10),fullfile('.','Data')); %#ok<UNRCH>
% Important - Run from here once data has been generated!!!
% *************************************************************************
ens = simulationEnsembleDatastore(fullfile(pwd,'Data'));
% %
% Configure the ensemble so that the read only returns the variables of interest
% Call the preprocess function
ens.SelectedVariables = ["Vce", "SimulationInput"];
data = read(ens);
[Vce,VceSpectrum,VceFrequencies,faultValues] = preprocess2(data);
% %Plot the voltage and voltage spectrum. The plotted data is for a fault free condition.
% % Figure with nominal
%Configure the ensemble with data variables for the condition indicators and condition variables for fault variable values.
%Then call the extractCI function to compute the features, and use the writeToLastMemberRead command to add the feature and fault variable values to the ensemble.
%The extractCI function is separate.
ens.DataVariables = [ens.DataVariables; ...
"fPeak"; "pLow"; "pMid"; "pHigh"; "pKurtosis"; ...
"qMean"; "qVar"; "qSkewness"; "qKurtosis"; ...
"qPeak2Peak"; "qCrest"; "qRMS"; "qMAD"; "qCSRange"];
ens.ConditionVariables = ["Bondwire_Fault","ESR_Fault","Thermal_Fault"];
feat = extractCI2(Vce,VceSpectrum,VceFrequencies);
dataToWrite = [faultValues, feat];
%The above code preprocesses and computes the condition indicators for the first member of the simulation ensemble.
%Repeat this for all the members in the ensemble using the ensemble hasdata command.
%To get an idea of the simulation results under different fault conditions plot every hundredth element of the ensemble.
% %Figure with nominal and faults
lVce = plot(Vce.Time,Vce.Data,'LineWidth',2);
lVceSpectrum = semilogx(VceFrequencies,pow2db(VceSpectrum),'LineWidth',2);
ct = 1;
lColors = get(lVce.Parent,'ColorOrder');
lIdx = 2;
% % Loop over all members in the ensemble, preprocess
% % and compute the features for each member
while hasdata(ens)
% % Read member data
data = read(ens);
% % Preprocess and extract features from the member data
[Vce,VceSpectrum,VceFrequencies,faultValues] = preprocess2(data);
feat = extractCI2(Vce,VceSpectrum,VceFrequencies);
% % Add the extracted feature values to the member data
dataToWrite = [faultValues, feat];
% % Plot member signal and spectrum for every 10th member
if mod(ct,10) == 0
line('Parent',lVce.Parent,'XData',Vce.Time,'YData',Vce.Data, ...
'Color', lColors(lIdx,:));
line('Parent',lVceSpectrum.Parent,'XData',VceFrequencies,'YData',pow2db(VceSpectrum), ...
'Color', lColors(lIdx,:));
if lIdx == size(lColors,1)
lIdx = 1;
lIdx = lIdx+1;
ct = ct + 1;
% %Configure the simulation ensemble to read the condition indicators, and
% %use the tall and gather commands to load all the condition indicators and fault variable values into memory
% % Get data to design a classifier.
ens.SelectedVariables = ["fPeak","pLow","pMid","pHigh","pKurtosis",...
idxLastFeature = 14;
% % Load the condition indicator data into memory
data = gather(tall(ens));
% % Convert the fault variable values to flags
data.BondwireFlag = data.Bondwire_Fault > 1e-4;
data.ESRFlag = data.ESR_Fault > 1e-6;
data.ThermalFlag = data.Thermal_Fault > 0.021551;
data.CombinedFlag =data.BondwireFlag+2*data.ESRFlag+4*data.ThermalFlag;
% %Create a classifier that takes as input the condition indicators and returns the combined fault flag.
% %Train a support vector machine that uses a 2nd order polynomial kernel.
% %Use the cvpartition command to partition the ensemble members into a set for training and a set for validation.
rng('default') % for reproducibility
predictors = data(:,1:idxLastFeature);
response = data.CombinedFlag;
cvp = cvpartition(size(predictors,1),'KFold',5);
% % Create and train the classifier
template = templateSVM(...
'KernelFunction', 'polynomial', ...
'PolynomialOrder', 2, ...
'KernelScale', 'auto', ...
'BoxConstraint', 1, ...
'Standardize', true);
combinedClassifier = fitcecoc(...
predictors(,:), ...
response(,:), ...
'Learners', template, ...
'Coding', 'onevsone', ...
'ClassNames', [0; 1; 2; 3; 4; 5; 6; 7]);
% %Check the performance of the trained classifier using the validation data and plot the results on a confusion plot.
% % Check performance by computing and plotting the confusion matrix
actualValue = response(cvp.test(1),:);
predictedValue = predict(combinedClassifier, predictors(cvp.test(1),:));
confdata = confusionmat(actualValue,predictedValue);
labels = {'None','Capacitor','Thermal','Bondwire','Bondwire & Capacitor', ...
'Thermal & Bondwire','Thermal & Capacitor','All'};
h = heatmap(confdata)
% ,...
% 'YLabel', 'Actual IGBT fault', ...
% 'YDisplayLabels', labels, ...
% 'XLabel', 'Predicted IGBT fault', ...
% 'XDisplayLabels', labels, ...
% 'ColorbarVisible','off');
% % Compute the overall accuracy of the classifier
% % Compute the accuracy of the classifier at predicting
% % that there is a fault
Pre-processing Function
function [Vce,VceSpectrum,VceFrequencies,faultValues] = preprocess2(data)
% Remove the 1st 0.1 seconds of the Voltage signal to eliminate transient
tMin = seconds(0.1);
Vce = data.Vce{1};
Vce = Vce(Vce.Time >= tMin,:);
Vce.Time = Vce.Time - Vce.Time(1);
% Ensure the Voltage is sampled at a uniform sample rate
Vce = retime(Vce,'regular','linear','TimeStep',seconds(5e-5)); %5e-5 is 20KHz sample rate
% Remove the mean from the Vce and compute the Vce spectrum
fA = Vce;
fA.Data = fA.Data-mean(fA.Data);
[VceSpectrum,VceFrequencies] = pspectrum(fA);
% 'FrequencyLimits',[2 2500]
% % Find the values of the fault variables from the SimulationInput
simin = data.SimulationInput{1};
vars = {simin.Variables.Name};
idx = strcmp(vars,'IGBT');
Bondwire_Fault = simin.Variables(idx).Value;
idx = strcmp(vars,'CAP');
ESR_Fault = simin.Variables(idx).Value;
idx = strcmp(vars,'IGBT2');
Thermal_Fault = simin.Variables(idx).Value;
% % Collect the fault values in a cell array
faultValues = {...
'Bondwire_Fault', Bondwire_Fault, ...
'ESR_Fault', ESR_Fault, ...
'Thermal_Fault', Thermal_Fault};
function ci = extractCI2(Vce,VceSpectrum,VceFrequencies)
% % Find the frequency of the peak magnitude in the power spectrum.
pMax = max(VceSpectrum);
fPeak = VceFrequencies(VceSpectrum==pMax);
% % Compute the power in the low frequency range Hz.
fRange = VceFrequencies >=10 & VceFrequencies <= 100;
pLow = sum(VceSpectrum(fRange));
% % Compute the power in the mid frequency range Hz.
fRange = VceFrequencies >= 100 & VceFrequencies <= 600;
pMid = sum(VceSpectrum(fRange));
% % Compute the power in the high frequency range Hz.
fRange = VceFrequencies >= 600;
pHigh = sum(VceSpectrum(fRange));
% % Find the frequency of the spectral kurtosis peak
[pKur,fKur] = pkurtosis(VceSpectrum);
pKur = fKur(pKur == max(pKur));
% % Compute the Vce cumulative sum range.
csVce = cumsum(Vce.Data);
csVceRange = max(csVce)-min(csVce);
% % Collect the feature and feature values in a cell array.
% % Add flow statistic (mean, variance, etc.) and common signal
% % characteristics (rms, peak2peak, etc.) to the cell array.
ci = {...
'qMean', mean(Vce.Data), ...
'qVar', var(Vce.Data), ...
'qSkewness', skewness(Vce.Data), ...
'qKurtosis', kurtosis(Vce.Data), ...
'qPeak2Peak', peak2peak(Vce.Data), ...
'qCrest', peak2rms(Vce.Data), ...
'qRMS', rms(Vce.Data), ...
'qMAD', mad(Vce.Data), ...
'fPeak', fPeak, ...
'pLow', pLow, ...
'pMid', pMid, ...
'pHigh', pHigh, ...
'pKurtosis', pKur(1)};

Accepted Answer

Rachel Johnson
Rachel Johnson on 17 Mar 2022
Hi Andy,
You are correct, the features you extract from your data are going to have a big impact on the accuracy of your classifier. And you're also right that more data is going to help here. Below are a few ideas to try that will probably improve the accuracy:
  • Features: This is probably your biggest issue, You're using the same feature set from the example, but the dataset is different, so it's likely that different features will help separate the classes. You can quickly extract and rank time- and frequency-domain features using the Diagnostic Feature Designer app, which will be different depending on your dataset. Here's an example that shows how to do this interactively for the same pump model you were referencing: Analyze and Select Features for Pump Diagnostics. I also recommend watching this tech talk which walks through the pump example from feature extraction through classification.
  • Data: 80 signals is pretty slim for 7 classes, and it's even slimmer because you need to save out some data for cross-validation. The example you linked generated 800 signals for the same number of classes. You mentioned that the dataset takes ages to generate, so you might investigate ways to improve the performance Simulink models.
  • Classifiers: You might also try different types of classifiers besides an SVM. There's no guarantee an SVM will be the best for your dataset just because it worked well for the example. The Classification Learner is a good place to start, or try fitcauto.
  1 Comment
Andy Wileman
Andy Wileman on 25 Mar 2022
Hi Rachel,
Thank you very much for your answer. I've followed your advice and I'm now using Wavelet Scattering to extract the features - much better results, for smaller data sets.
I've also increased the data to 240 signals, again this has helped.
The classification learner is brilliant. Using the advanced feature, once I selected the best classifier, I've again been able to make slight increments by tuning.
I still have a bit more worked to do, but the results are looking a lot better.

Sign in to comment.

More Answers (0)


Find more on Machine Learning and Deep Learning for Signals in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!