splitting dataset into training and testing

5 views (last 30 days)
Hello,
We are trying to split some data into training and testing datasets, 80% to 20% respectively, and we are having the folowing issue:
  • if the splitiing is done using a method such as "cvpartition" or any other similar method it would split the data randomly point by point, whereas our data is more like a time series data and its preferable to keep the data related to a certain date and time all together, not splitted between training and testing.
we thought about giving the rows that are related to a certain date and time a specific number as an index or a label and then split these numbers randomly, but this will not garantee that the the training to tesing is 80 to 20 as not all the data for a certain dates have the same size
I attach a sample of the data above if anyone can help,
Thanks in advance.

Answers (1)

Manas Shivakumar
Manas Shivakumar on 9 Aug 2022
There are a couple of functions to split the testing dataset. They include :
  • cvpartition
  • crossvalind
Since you don't know the size of your parition beforehand, I suggest you come up with a labelling system first that pairs these dates based on some criteria. you can then precisely separate them by getsamples(timeseries,ind). Coming to the size of distribution you could simply try out all possible combinations with different number of merges and pick the one that is the closest towards your needed split.

Products


Release

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!