How to do cross-validation partitioning and training on Big data? (I am facing issue with my big data using tall arrays)

9 views (last 30 days)
Hi,
I am not an expert in matlab big data handling. I am strugling for few days because I have to do features selection and training a machine learning classifier on my data. I have followed matlab docs and few videos from Mathworks. However, I am still not able to make it work.
I have 12 mat (image-001.mat, image-002.mat, ..., image-012.mat) files that is extracted features from 3D images and saved in a folder. Each mat file has arount 600,000 records and is saving two variables, one is data and one is class labels for each record. I need to load all of them as training set into matlab and train a machine learning model with data. However, after loading only 2 files, I am getting the following error:
Index exceeds matrix dimensions.
What I did is as following:
1) I saved all of them as csv files in featFolder and created a datastore :
csvList=dir(strcat(featFolder,'\*.csv')); % list csv files in the folder of extracted features
no_subjects=length(csvList);
ds=datastore(sprintf('%s',strcat(featFolder,'\*.csv')));
2) I created a tall array from the datastore:
TA=tall(ds);
3) when I am trying to send the tall array to a costum function, I am getting the following error:
fsFeatures=call_sequentialfs(TA{:,1:end-1},TA{:,end});
% this is the costum function
function inmodel=call_sequentialfs(X,y)
classes = unique(y);
SVMModels = cell(3,1);
inmodel=cell(3,2); % the first column keeps the selected features and the second column keeps the history
rng(1);
c2 = cvpartition(y,'HoldOut',1/10); % for tall arrays
opts = statset('display','iter','UseParallel',1);
for j=2:numel(classes);
indx = eq(y,classes(j)); % Create binary classes for each classifier
fun = @(Xtrain,Ytrain,Xtest,Ytest)...
sum(Ytest~=predict(fitcsvm(Xtrain,Ytrain,'ClassNames',[false true],'Standardize',true,...
'KernelFunction','rbf','BoxConstraint',1),Xtest));
[inmodel{j,1},inmodel{j,2}] = sequentialfs(fun,X,indx,'cv',c2,'options',opts,'nfeatures',80);
end
end
The error when it reaches to the cross-validation partitioning line:
Error using internal.stats.bigdata.cvpartitionTallImpl>lazyAssert (line 195)
Incompatible tall array arguments. The tall arrays must be created using the same execution
environment.
Error in internal.stats.bigdata.cvpartitionTallImpl (line 80)
LAS = lazyAssert(floor(cv.N * T)>0, LAS, @()
error(message('stats:cvpartition:PTooSmall')),clientfun);
Error in cvpartition (line 153)
cv.Impl = internal.stats.bigdata.cvpartitionTallImpl(varargin{:});
Error in tall/cvpartition (line 20)
cv = cvpartition({t,t.Adaptor,@partitionfun,@clientfun},varargin{:});
I do not know how the cross-validation and training by classifier can be done? I would really appreciate any help and suggestion. If there is any link or code, please share.
Thanks

Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!