How I can use K-fold cross validation for training selection sequentially not randomly?
6 views (last 30 days)
I have total 540 dataset where each 20 are similar data set and sequentially organized. I want to use K-fold cross validation which select each 20 data set sequentially for test. Exmple: for 1st test set among 540 data, the last 20 will use for test and first 520 will be training. So total, 54 fold will be used for all combination.
Adam Danz on 16 Nov 2021
Edited: Adam Danz on 16 Nov 2021
I suggest identifying each group with a grouping variable. groupID is the grouping variable.
% Set these values
n = 540; % number of data points
nPerGroup = 20; % number of consecutive values per group
% compute group IDs
nGroups = n/nPerGroup; % number of groups (must be an integer)
assert(mod(nGroups,1)==0, 'nGroups must be an integer') % assumption check
groupID = repelem((1:nGroups)', nPerGroup, 1); % group ID (same length as your data (540 elements)
For 540 values and 20 values per group, there are 27 partitions for 27-fold cross validation. Note, however, that k-fold cross validation uses random samples and you are not sampling randomly unless the list of 540 values are already randomized so if these methods are described in a publication, you'd need to inditate this adjustment to the methodology. For each repetition one partition is reserved for testing while the other k-1 partitions are used for training. This also departs from your explanation that one partition is used for training and another for testing.
To implement 27-fold cross validation, you can loop through each groupID as follows, data is your 540-element data which is assumed here to be a vector
for i = 1:nGroups
train = data(groupID ~= i);
test = data(groupID == i);
% < do Training >
% < do testing >