Efficient Way To Split Dataset Into Subsets

3 views (last 30 days)
E
E on 18 Nov 2017
Commented: E on 26 Nov 2017
Hello,
I need to split a large dataset (DxN numeric array) into multiple subsets. I can use the code below (where groupIDs is an Nx1 matrix of integer IDs - the group to which each datapoint belongs).
groups = unique(groupIDs);
for i = 1:numel(groups)
tempData = data(:,groupIDs==groups(i));
%do work on tempData
end
However, 90% of the run time of the above code is spent just creating tempData! That amounts to over a minute every time I want to do this. Is there a more efficient way to split data by groupIDs? I tried splitapply() but it doesn't seem to be any faster.
Are there any matlab gurus out there that know a trick? Thanks!
  5 Comments
Jos (10584)
Jos (10584) on 24 Nov 2017
12Gb? That is quite a lot. If this doesn't fit in memory, swapping to disk is the likely bottleneck ...
E
E on 26 Nov 2017
Thanks for the replies. I do have plenty of RAM left to spare, so it doesn't look like the hard drive is involved. Confirmed (re Greg) that using the output of unique is no better. For example, numeric indexing offers no improvement, and the indexing itself is not really the problem - it's probably the data copying:
disp('a. original (without "doing work")');
tic;
for i = 1:numel(groups)
tempData = data(:,groupIDs==groups(i));
end
toc
disp('b. numeric indexing');
idxs = cell(numel(groups));
for i = 1:numel(groups)
idxs{i} = find(groupIDs==groups(i));
end
tic;
for i = 1:numel(groups)
tempData = data(:,idxs{i});
end
toc
disp('c. logical operation alone');
tic;
for i = 1:numel(groups)
tempData = (groupIDs==groups(i));
end
toc
a. original (without "doing work")
Elapsed time is 4.590886 seconds.
b. numeric indexing
Elapsed time is 4.526391 seconds.
c. logical operation alone
Elapsed time is 0.066057 seconds.
There's gotta be another way - if I use a for loop with 3 million iterations it only takes 2 seconds longer.

Sign in to comment.

Answers (0)

Categories

Find more on Scope Variables and Generate Names in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!