Clear Filters
Clear Filters

Extracting testing and training data from a single dataset

8 views (last 30 days)
I have a dataset of size 14400 x 14, where the first 2 columns represent a users x- and y- position, and ranges from 1 : 121.
Example:
first_col second_col . . . . . .
1 1
1 2
1 3
so on to 121
2 1
2 2
so on to 121
3 so on to 121
. .
so on to 121 so on to 121
I want to separate the testing data based on the user location ranging from first_col(1:30) and 2nd column(1:30).
I a using for loop, but it is taking a lot of time.
I would really appreciate any kind of suggestions on this issue.
Thank You
  2 Comments
Rahul Gulia
Rahul Gulia on 28 Oct 2022
Edited: Rahul Gulia on 28 Oct 2022
I also want to to be able to separate the dataset for training and testing purpose. And then later combine both the datasets into one for further use.
I guess we can use the index values for this one.
Khushboo
Khushboo on 31 Oct 2022
Hi Rahul,
I am sorry I did not fully understand how you want your test data to look like. Could you kindly elaborate more using an example? From what I assume, using slicing would work for your use case.

Sign in to comment.

Accepted Answer

Rahul Gulia
Rahul Gulia on 31 Oct 2022
I was able to solve this issue of mine. It was a simple example to join 2 matrices according to the 1st column values of both the matrices.
Example code:
**************************************************************
xx = [1 7 8; 4 9 10; 5 11 12];
yy = [2 13 14; 3 15 16; 6 17 18];
zz = [xx; yy]
ww = [];
for pp = 1:length(zz)
for qq = 1:length(zz)
if pp==zz(qq,1)
ww = [ww; zz(qq,:)];
end
end
end
ww
*****************************************************************

More Answers (2)

Rajeev
Rajeev on 31 Oct 2022
Hi Rahul,
Logical Indexing can be used to extract the required data from the array.
Assuming that the name of the matrix is "location", to extract only the user locations ranging from 1 to 30, one can proceed in the following way:
% logical indexing is used to extract the index of the required data from each column
first_col_index = first_col <= 130;
second_col_index = second_col <=130;
% logical & (and) operations gives the index of columns where both coordinates are less than or equal to 130
location_index = first_col_index & second_col_index;
% assuming the matrix "location" is a row matrix, the logical index array can be used to extract the required data
location_new = location(location_index,:);
Here is the documentation for logical indexing: Matrix Indexing in MATLAB - MATLAB & Simulink (mathworks.com)

Rahul Gulia
Rahul Gulia on 31 Oct 2022
I figured out a way to create the training and testing data based on the location of the users. Here is how I did it.
My DatasetTmp_14 looks like this. (Note: the first column contains the index terms of each row)
1 0 0.5 40.36 43.05 0 1 60 0 54.5 0.5 1 15 5 2301
2 0 1 40.02 42.74 0 1 60 0 54 1 1 15 5 2336
3 0 1.5 39.69 42.43 0 1 60 0 53.5 1.5 1 15 5 2311
4 0 2 39.37 42.13 0 1 60 0 53 2 1 15 5 2327
5 0 2.5 39.05 41.83 0 1 60 0 52.5 2.5 1 15 5 2318
DatasetTmp_14 size = 13310x15.
Now,
*****************************************************
idx1 = (1:length(DatasetTmp_13))';
DatasetTmp_14 = [idx1 DatasetTmp_13];
quadrant_data_test = [];
quadrant_data_train = [];
for pp = 1:length(DatasetTmp_14) % Takes too long to execute
if (DatasetTmp_14(pp,2)<=30 && DatasetTmp_14(pp,3)<=27.5)
tmp1 = DatasetTmp_14(pp,1:15);
quadrant_data_test = [quadrant_data_test; tmp1];
else
quadrant_data_train = [quadrant_data_train; DatasetTmp_14(pp,1:15)];
end
end
*****************************************************
Now I would like to combine the two datasets based on their index values, which I executed like this. This is where I am stuck right now. Kindly let me know of any suggestion on my code, as the new matrix is not created according to proper sequence.
*****************************************************
test_heatmap_data_tmp = [quadrant_data_test; quadrant_data_train];
recreated_dataset = [];
for pp = 1:length(test_heatmap_data_tmp)
for qq = 1:length(test_heatmap_data_tmp)
if (pp == test_heatmap_data_tmp(qq,1))
tmp = test_heatmap_data_tmp(pp,:);
recreated_dataset = [recreated_dataset; tmp];
end
end
end
*****************************************************
This is how the recreated and original image should look like for better reference.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!