why is 'gather' performance dramatically poor when I evaluate a tall array (105151 Bytes) by an indexing operation

Hi experts,
I have a tall array from type double (Mx386) containing voltage values in each .
For visualization purposes I want to evaluate the first values of one electrode, but this single process took 5 h.
test1 = gather(test(1:500,3));
Also evaluating the tall array to get size is a time consumpting events
gather(size(test,1))
How can I accelerate the process? What could likely be the issue?
I'm happy about any hint to solve this.
Thank you in advance!
Eva

6 Comments

Roughly how large is M?
Is test input data or is it transformed data? Because if it is input data then there might be ways to read fewer columns.
Thank you for your Comment Walter!
The number of rows has a dimension of E10; test is just the tall array. You mean that it's faster to built a smaller. unevaluated variable(like: smaller = tallVar(:,1)) and then evaluate the content by gather (result = gather(tallVar))? I will try out, thank you.
That would be an interesting test but it is not what I mean.
I suspect that when you gather like that, it forces matlab to go through reading all columns of the array (in batches of rows) and extract the used columns from that. It would, however, be faster to tell the underlying datastore to read only the columns needed
https://www.mathworks.com/help/matlab/ref/matlab.io.datastore.tabulartextdatastore.html
And see SelectedVariables property or use * in the TextscanFormats property
(We know that you do not have a xls or xlsx spreadsheet for your data because those cannot have as many rows as you are using.)
Thank's a lot for your help!
"(We know that you do not have a xls or xlsx spreadsheet for your data because those cannot have as many rows as you are using.) "
- Really lots of rows *g!!
Unfortunately the underlying datastore does not contain the data I want to display, since I created a new tall array. Maybe this is not a efficient procedure?
In brief:
  1. I stored voltage data of 386 electrodes in a csv file.
  2. I saved data in a datastore and then created a tall array out of it.
  3. I applied a filter on the voltage data, without using gather and store the filtered values in a new tall array
  4. Now I would like to visualize a part of the filtered data
In fact I tried out selecting a subset (the whole 2. column) of the tall array and evaluated the data from that tall column vector. It was faster, but still slow ( 10 min compared to terminating the process since I lost my patience).
I don't do parallel computing.
That's why I asked "Is test input data or is it transformed data?"
Yes, know I get it. Sorry, I'm from biological field, and try to step into data analysis right now. So underlying computer processes are not always clear to me, that means hints like
when you gather like that, it forces matlab to go through reading all columns of the array (in batches of rows) and extract the used columns from that
are super helpful for me. I didn't know that before.
I'll try out transformedDatastore now, to apply the filter function. instead of using the creation of a tall array at this place.
Thanks for your help!

Sign in to comment.

Answers (0)

Asked:

on 19 Jul 2019

Edited:

on 22 Jul 2019

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!