populating a cell using a loop

This is my code
a = dir;
i=3;
n=1;
b = zeros(n,55);
while i<=43
d = a(i).name;
A = textscan(fopen(d),'%f %f %f %f %f %f %f %f %f %f %f %f %f %f %s %s %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %s %s %f %f %f %f %f %f %f %f', 'Delimiter',',','Headerlines',1);
A_time = A{1,1};
B_time = num2str(A_time);
x = datenum(B_time,'yyyymmddHHMM');
b = A(n,:);
i=i+1;
n=n+1;
end
I've for help with this code in a couple different ways and while I appreciate any help, a lot of it has not been what I'm looking for. I'm trying to run this but when I do, it says that the index exceeds matrix dimensions for the line b=A(n,:). Please just help me figure out why this isn't working. I know it's difficult without the data files but just know that they all have 55 columns. When the part in the loop is run by itself up to b, A is a 1x55 cell. I want b to be a cell with 55 columns but with n rows. one row for each data file that I run. Please just tell me what is wrong with this code. Let me know if clarification is needed

2 Comments

Stephen23
Stephen23 on 7 Oct 2018
Edited: Stephen23 on 7 Oct 2018
"populating a cell using a loop"
A is already a cell array, and it contains all of the imported data. So what advantage do you see in copying all of that data into a new cell array?
Note that it would be much more efficient to import the file once before the loop, rather than importing the same file 43 times just to extract one row each time.
A only has imported data from 1 file. I would like to have an array with the information from all iterations together.

Sign in to comment.

 Accepted Answer

Stephen23
Stephen23 on 7 Oct 2018
Edited: Stephen23 on 7 Oct 2018
textscan returns a 1x55 cell array, where the format string specifies the 55. So your code:
A = textscan(fopen(d),'%f %f %f %f %f %f %f %f %f %f %f %f %f %f %s %s %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %s %s %f %f %f %f %f %f %f %f', 'Delimiter',',','Headerlines',1);
will result in a 1x55 cell array. Then a few lines later you write this:
b = A(n,:);
because A only has one row, as soon as n>1 this will be an error, because C only has one row, and so requesting anything from its (non-existent) second (or higher) row will be an error.
"I want b to be a cell with 55 columns but with n rows. one row for each data file that I run"
I really really really recommend that you don't do that: that would require putting scalar numeric data into the cells of a cell array, which just makes it much harder to work with numeric data. You should really keep the numeric data in numeric arrays, or use a table. Because you have a few columns which are character, this complicates the importing a little bit, but there are reasonable solutions which you should look at:
  • If you do not need those character columns then get textscan to ignore them with %*s in the format string. Then could trivially get textscan to collect all of the numeric data into one numeric array, using the CollectOutput option. Very simple, but you would lose some data.
  • Use the CollectOutput option to collect the data into arrays of matching types: this would give you one 1x5 cell array C, containing an Nx14 numeric array, an Nx2 char cell array, an Nx29 numeric array, an Nx2 char cell array, and an Nx8 numeric array (or whatever sizes that format string gives you). I recommend this option.
  • Use a table. These are a very convenient way for handling mixed data (e.g, numeric, char, categorical) and analyzing it. It has many powerful methods and operators for processing data by groups, and for statistical analyses.
Or your proposal:
  • If you really want to get all of your data into one cell array (which will make any numeric processing slow, inefficient and complex), then you will need to post-process the data after it has been imported, something like this: detect numeric columns, convert numeric columns to cell array containing numeric scalars (e.g. num2cell), then concatenate all into one cell array. I strongly advise you to avoid doing this.

13 Comments

Actually, I forgot to edit it here but the %s values should actually be %f. if you don't mind, how would I go about using collectoutput in this case?
"...how would I go about using collectoutput in this case?"
Looking at the textscan help, the CollectOuput option is shown as having two possible values: true or false (default). You want to select true. The help explains "If true, then the importing function concatenates consecutive output cells of the same fundamental MATLAB® class into a single array." So if all of your data are numeric (i.e. use %f) then C will be a 1x1 cell array containing one Nx55 numeric array, which you can easily get out of the cell array using indexing. Something like this:
opt = {'Delimiter',',', 'Headerlines',1, 'CollectOutput',true};
fmt = repmat('%f',1,55);
[fid,msg] = fopen(...)
assert(fid>=3,msg)
C = textscan(fid,fmt,opt{:});
fclose(fid);
mat = C{1}; % <- Nx55 numeric array
Note how I used repmat to define the format string: this is simpler than writing out %f fifty-five times! Note that using the correct cell array indexing is critical:
If you want to do this in a loop, i.e. for multiple files, then you can follow the guidelines in the MATLAB documentation:
For example, if you decide to use the dir-based method:
opt = {'Delimiter',',', 'Headerlines',1, 'CollectOutput',true};
fmt = repmat('%f',1,55);
D = 'directory where your files are saved';
S = dir(fullfile(D,'*.csv'));
N = {S.name};
C = cell(1,numel(N));
for k = 1:numel(N)
[fid,msg] = fopen(fullfile(D,N{k}));
assert(fid>=3,msg)
C(k) = textscan(fid,fmt,opt{:});
fclose(fid);
end
and then each cell of C will contain the entire numeric array for each file, i.e. all of the imported data will be in C.
If you want the filenames to be read in alphanumeric order then you will need to sort them. An easy way to sort filenames into alphanumeric order is to download my FEX submission natsortfiles, and replace the appropriate line with this:
N = natsortfiles({S.name});
Thank you for all your help so far but there's just one thing. When I tried running that code pretty much exactly as you sent it, C was a 1x41 cell. and in each of those cells, it was a 1x55 cell. This is definitely not the data from all the files. I believe this is just from 1 file. Do you know what the issue may be? I had a very similar issue trying to put the collectoutput in my original code
Stephen23
Stephen23 on 8 Oct 2018
Edited: Stephen23 on 8 Oct 2018
"C was a 1x41 cell. and in each of those cells, it was a 1x55 cell. This is definitely not the data from all the files"
I don't see that. C is 1x41 because of the files matched by dir (one cell per file), but the only way that all of C's contents are 1x55 cell is if textscan works without error, i.e. if 55 columns from each file is imported into one cell. Each cell of C contains the data for one file, and what you have described is entirely consistent with that. It is exactly what I would expect (given that apparently you did not use the CollectOutput option after all). Note that in my code within each loop iteration the loop variable k is used to store the entire imported data from one file in one cell, and each loop imports a different file (by changing the filename): this means if you have data in multiple cells of C, your code must have read multiple files.
So far you have not given any reason or explanation for your statement that "...is definitely not the data from all the files. I believe this is just from 1 file": how did you actually test/check that of the data is all from the same file?
A simple test would be to look at the file data: check the first data row of the first two files, and those of the first two cells of C. Tell me what you find.
"Do you know what the issue may be?"
I don't see any issue. What you have described seems to be the correct and expected behavior.
PS: Note that textscan "working without error" does not neccesariy mean that it has imported all of the data from a file: it might stop when it no longer matches the format string. But that is not really related to the statement you made in your comment.
I think the confusion is because each file contains many rows but in each cell of C, there is only 1 row. I'm just wondering where the rest of the data is
Stephen23
Stephen23 on 8 Oct 2018
Edited: Stephen23 on 8 Oct 2018
"but in each cell of C, there is only 1 row"
This is expected. textscan returns a 1xN cell array, where N depends on the format string and the options you used. So each cell of C will be a 1xN cell array... and each cell of that contain a RxC array (which might be numeric or cell, depending on the format string and the options you used).
So you need to look inside those cells to find your columns of data. Your data are nested inside two sets of cell arrays. This is why I recommend using the CollectOutput option, because if all of the data can be collected into one array of the same class (e.g. numeric), then you will avoid one layer of cell array nesting.
Check the looped-code that I used in my earlier comment: if all of the data can be collected into one array then the output will be a 1x1 cell array containing all of that data. Note how I used parentheses (not curly braces) to allocate this cell to the cell array C. This will give you all of the file data in C without extra nested cell arrays.
I'm so sorry but I'm still lost. So C is 1x41 which makes sense because there were 41 files, right? But then looking at C{1,1}, for example, that's data from the first file so it should have the dimensions of that data file, right? Like the first file has 55 columns and 72 rows so shouldn't C{1,1} be a 72x55 cell? Instead it's a 1x55. It seems like it only contains one row from the whole file? Sorry if i'm being unclear or if there's something you already explained that i'm still not getting. i'm pretty new to matlab and i'm just trying to figure it out
Stephen23
Stephen23 on 8 Oct 2018
Edited: Stephen23 on 8 Oct 2018
"But then looking at C{1,1}, for example, that's data from the first file so it should have the dimensions of that data file, right?"
Nope. It will be a cell array with size 1x55, because that is what textscan returns for your format string.
"Like the first file has 55 columns and 72 rows so shouldn't C{1,1} be a 72x55 cell?"
Nope. It will be a cell array with size 1x55, because that is what textscan returns for your format string.
"Instead it's a 1x55."
Yep. Because that is what textscan returns for your format string: a 1x55 cell array, regardless of how many rows it matches. Each cell of its output contains some RxC array (which in your example will be numeric column vectors of size 72x1, if it matches seventy-two rows of data).
"It seems like it only contains one row from the whole file?"
Possibly, but I have no way to check this. If your format string only matches the first row, then you will only get one row of data. You should check the format string and the textscan options.
"Sorry if i'm being unclear or if there's something you already explained that i'm still not getting"
No need to apologize! You keep thinking that the output of textscan should be 72x55. I keep telling you that it should be 1x55 (which is what you are getting) and that you need to look inside those cells to see your rows of data. Don't worry, I have plenty of patience, so for as long as you keep writing "..shouldn't C{1,1} be a 72x55 cell?" I will keep replying with "nope", and explaining why.
Susan Santiago
Susan Santiago on 8 Oct 2018
Edited: Susan Santiago on 8 Oct 2018
Once again thank you so much for your patience as I realize I keep asking pretty much the same question. Maybe this image will make it more clear what i'm confused about. Here I have C{1,1} and the first data file from the loop. The data in C{1,1} looks like the second row of the file but i'm curious as to where the data from the rest of the rows is, if not here. I hope this makes sense. You mention a cell of size 72x1 but i'm just not sure where I can find that since this is all there is in cell C. Thank you again for your time
Stephen23
Stephen23 on 8 Oct 2018
Edited: Stephen23 on 8 Oct 2018
"The data in C{1,1} looks like the second row of the file but i'm curious as to where the data from the rest of the rows is, if not here."
As I wrote several times already, the fact that textscan runs without error does NOT mean that it imported all of the rows from your data file. It is quite possible that your format string matches only one row of data, or only two rows, or only three rows... This might occur if your format string is not suitable, or some of the options are not suitable, or if there is some unnoticed inconsistency somewhere in the data file. There is no automatic way to know if data has been imported correctly: you just have to check it yourself.
"I hope this makes sense. You mention a cell of size 72x1 but i'm just not sure where I can find that since this is all there is in cell C"
What I wrote in my last comment, with some emphasis added: "Each cell of its output contains some RxC array (which in your example will be numeric column vectors of size 72x1, if it matches seventy-two rows of data)."
This means that if textscan does not match 72 rows of data, then it will not return 72 rows of data. If you get fewer rows of data imported than you expect, then possibly the format string (and/or some of the option/s) does not match the data file at some point, so it stops importing the data after some number of rows of data (in your screenshot it seems to have given up after one row).
After looking at your screenshot I can see why you are confused: because your textscan call did not import all of the rows from your file, so you were wondering where those 72 rows are... And I was trying to tell you that you will only get 72 rows if textscan imports 72 rows from the file... which clearly did not happen. So you need to find out why: you will need to look at the format string and the textscan options, and read the textscan help thoroughly. If you upload a few sample files in a new comment (by clicking the paperclip button) then I will take a look at them tomorrow (it's midnight where I live).
Okay thank you so much, that was really unclear to me. Here are some of the files. Once again, I appreciate all the help!
I think at least part of the problem is that I incorrectly thought that some of the columns were fine as %f when they weren't because I just went back and part of a very similar code that worked fine before was only outputting one row of data after making that change. So the four columns that are labelled as strings, I think need to stay that way
I was able to fix the issue! Thank you so much for your help and patience, I learned a lot today

Sign in to comment.

More Answers (0)

Categories

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!