Gaussian Process Regression not working with normalized data or noise
    15 views (last 30 days)
  
       Show older comments
    
I have data with around features that I am trying to fit regression models to. The data is a series of vectors that I have in three forms: noiseless vectors, noiseless unit vectors, and noisy unit vectors. The noise applied to the vectors is gaussian white noise.
Most models (bag ensemble, svm, gam, logitboost ensemble), perform best with the noiseless unit vector data and then slightly worse with the noisy or non unit vector data. The GPR model, on the other hand, performs the best of all the models on the noiselss non-normalized data but then completely fails with the unit vector data (it predicts everything as the same value) and mostly fails with the noisy data (it predicts almsot all test values as the same number with a few exceptions that it predicts perfectly).
Here is example code for the normalized data (all data sets are processed the same way the data is just different):
%% Normal Data
clc; clearvars;
load threeEmbedData.mat
% turns the data and labels into 2d matrices
ylabels = reshape(yLabels,size(yLabels,1)*size(yLabels,2),3);
x = xGen(svData);
x(isnan(x)) = 0; 
% splits data into test and validation sets
index = randperm(length(ylabels));
percent = .8;
final = round(percent*length(ylabels));
predictors = 1000;
train = x(1:predictors,index(1:final));
test = x(1:predictors,index(final+1:end));
Ytrain = ylabels(index(1:final),2);
Ytest = ylabels(index(1+final:end),2);
train = train';
test = test';
% creating the model
mdls{1} = fitrgp(train,Ytrain);
% sorting predicted test values and comparing them to actual test values
Y = predict(mdls{1},test);
[~,I] = sort(Ytest);
figure
scatter(1:size(Ytest,1),[Ytest(I) Y(I)])
% function for reshaping sensitivity vector fields and returning the
% reshaped values for embeded data
function [X] = xGen(svVec)
dim = [size(svVec)];
X1 = zeros(dim(1)*dim(2),dim(3));
X = zeros(dim(1)*dim(2),dim(3)*dim(4));
for l = 1:dim(4)
    for k = 1:dim(3)
        X1(:,k) = reshape(svVec(:,:,k,l)',[dim(1)*dim(2) 1]);
    end
    X(:,dim(3)*l-dim(3)+1:dim(3)*l) = X1;
end
end
This data generation works great but the data with nosie and the normnalzied data do not work at all. I have attached the results of all three below. My data files are all around 60 Mb so I cannot attach them but if anyone knows a workaroudn I would love to attach them for easier troubleshooting.
0 Comments
Answers (1)
  Neha
    
 on 27 Jun 2023
        Hi Alejandro, 
I understand that the GPR model is not performing well for noisy data and unit vector data. I suggest you optimize the hyperparameters, especially the kernel function by changing it to “matern32” or “matern52”. If this doesn't improve the performance, you can experiment with different kernel choices and compare their performance on a validation set. You can fit the GPR model with different kernels and evaluate their predictive accuracy or goodness-of-fit metrics. Choose the kernel that provides the best performance according to your evaluation criteria. 
Similarly other hyperparameters can also be optimized through grid search. 
I hope this helps! 
0 Comments
See Also
Categories
				Find more on Gaussian Process Regression in Help Center and File Exchange
			
	Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!
