You are now following this question
- You will see updates in your followed content feed.
- You may receive emails, depending on your communication preferences.
fopen cannot read directory names in scientific format
1 view (last 30 days)
Show older comments
Hi everyone, I have some data that is stored as .csv file in different directories. After looking through this forum, I have managed to find a way of extracting the relevant .csv file from the different directories, using the code provided in this post https://uk.mathworks.com/matlabcentral/answers/278950-i-m-trying-to-write-a-code-that-import-several-data-from-multiple-folder.
The code works fine until it encounters a directory name in scientific notation and it stops with the error:
Error using textscan
Invalid file identifier. Use fopen to generate a valid file identifier.
Error in multipleDirectories_subfolders_test (line 27)
MyData{end+1} = textscan(fileID, formatSpec, 'Delimiter', ',', 'HeaderLines',startRow-1); % format depends on files % this line might need changing
The code is attached. I've also tried using csvread but I still run into problems with this. After setting a few breakpoints in the code I've identified the problem to be the format of some directories being in scientific notation as shown in the following screenshot:
The code reads perfectly fine the .csv files in the 0, 0.0001, 0.00102, 0.000104, 0.000106 directories but the file identifier becomes invalid for the remaining directories.
Does anyone know how to solve this? It looks like fopen can only deal with integer type directories so the error is not due to textscan or csvread.
I'm really at a loss as to how to import the data otherwise.
p.s. The directories are by default saved in that format from a separate software and there's hundreds of them so changing the name manually is not really an option.
19 Comments
Geoff Hayes
on 3 Jun 2020
Jacqueline - on my Mac with R2014a, I was able to open a text file from a folder named 1.2e-05. Are you sure that these folders have files named opposite_01um_grad(U).csv? (I guess that this is supposed to be true for all folders?)
Stephen23
on 3 Jun 2020
Edited: Stephen23
on 3 Jun 2020
No problems with MATLAB R2012b on Win10:
>> D = '3.69e-05';
>> mkdir(D)
>> dlmwrite(fullfile(D,'test.csv'),[1,2,3],' ')
>> fid = fopen(fullfile(D,'test.csv'),'rt'); fscanf(fid,'%f'), fclose(fid);
ans =
1
2
3
What MATLAB version and OS are you using? Given that the folder names change number format depending on their values, perhaps you chould check those filenames too.
Jacqueline Mifsud
on 3 Jun 2020
Hi all,
Thanks for the most valuable comments and quick responses. Stephen, I am using R2017a on CentOS.
The good news is I think I've solved it, the bad news it was something really trivial that I could have spotted sooner.
For some reason, the directory 0.000106 was empty (the external program stopped writing data at this time instant) so when fopen was trying to open the .csv file it was running into problems, as Geoff suggested. I apologise for any confusion.
Is there a way perhaps to eliminate the specific file name opposite_01um_grad(U).csv to something more general like for instance '*.csv' such that fopen would read any .csv files contained in the respective directories? Would csvread be more suitable than textscan for this purpose?
Thanks again.
Mohammad Sami
on 3 Jun 2020
you can use the dir function to get a list of all files in a specified directory.
files = dir(fullfile(D,'*.csv'));
Stephen23
on 3 Jun 2020
Edited: Stephen23
on 3 Jun 2020
"Is there a way ... such that fopen would read any .csv files contained in the respective directories?"
File reading/writing functions all require the explicit name of one particular file. They do not accept wildcard characters.
But you can easily call dir and use its output, e.g. inside your loop:
S = dir(fullfile(mypath,SubFold(i).name,'*.csv')); % DIR with wildcard
assert(numel(S)==1,'Only one CSV file is allowed!')
filetoread = fullfile(mypath,SubFold(i).name,S.name);
If you expect multiple CSV files in those folders then you need to decide how to handle them, e.g. use a nested loop to process them all, or filter for one file using a particular name pattern, or skip that folder, etc. We cannot decide that for you.
Jacqueline Mifsud
on 3 Jun 2020
Yes I will have to give it a bit more thought if I have multiple .csv files for different data - but tbh I'm trying to work around it by separating them in different directories to make things a bit more straightforward.
I'll give these suggestions a try, thanks again for the help.
Jacqueline Mifsud
on 3 Jun 2020
So now the code works fine but I'd like to organise my data in chronological order.
To give some context, I would like to plot a time vs. space map using this data. The SubFold names are essentially time directories containing the spatial data in the respective .csv files. Although now all the .csv files are read, I can't seem to figure out which SubFold they belong to and therefore the time-directory.
Is it possible to make the loop access subfolders sequentially or another clever way to reorganise the data? (I hope my question is clear).
Stephen23
on 3 Jun 2020
Edited: Stephen23
on 19 Jan 2022
"Is it possible to make the loop access subfolders sequentially..."
Given those folder names, you would have to import the names into MATLAB, convert them into numeric or times (e.g. using datetime or duration), sort the numeric/datetime/duration values and get the indices, then use those indices to finally sort the folder names.
One way to do that would be to download my FEX submission natsortfiles and use that:
for which you will need to use a regular expression to match your numbers, e.g. '\d+\.?\d*(e[-+]?\d+)?'
"...or another clever way to reorganise the data?"
If you used ISO8601 timestamps then the the OS would return them in the desired order, as would any trivial character sort of them. Basically if you designed the names a bit better, then your code and file processing is much simpler (in fact, you really wouldn't have to do any sorting at all).
In practice this would mean fixed-width time values complete with leading zeros and no e-notation.
Jacqueline Mifsud
on 3 Jun 2020
Sorry for the delay in reply. I've checked if it's possible to have the directories saved in a different format but it seems that this is hard-coded in the external program. Therefore, it looks like I will need to code accordingly in MATLAB to sort the files.
I already had a go at this by using sort without much success. I will try using natsortfiles, hopefully I'll figure it out without too much difficulty. Thanks again for the useful tips.
Jacqueline Mifsud
on 3 Jun 2020
I've managed to make some (although not great) progress.
I've downloaded natsortfiles and try to sort the data using the regexp provided. However, it's still not in chronological order. I'm thinking it's something to do with the regexp, or something else is not quite right in my code?
I'm attaching files to demonstrate the problem with the sorting. 'ans' is the result of the sorting and timeDirs is the list of time directories in numeric type. For some reason the time-directories with scientific notation are all segregated at the end of the list. Attached is also the revised code.
p.s. I'm no expert in matlab (this is probably the most complex thing I ever tried doing in matlab) so suggestions for improvement would be appreciated.
Stephen23
on 3 Jun 2020
Edited: Stephen23
on 4 Jun 2020
One of the first steps that natsortfiles does with the input names is to use fileparts to split the names into a filename and a file extension. fileparts splits at the last period/dot character in the name. So some of your folder names are treated as consisting of two parts which are split at the decimal point (one part is the "filename", the other the "extension"), and those two parts are then sorted separately. Unfortunately your use-case is not a scenario I considered, so thank you for discovering that!
You can convert and sort the numeric values yourself, e.g. something like this:
vec = str2double(SubFoldName(1,:)); % convert to numeric
[~,idx] = sort(vec);
sortedNames = SubFoldName(1,idx)
Tips on code: assuming that the first two elements returned by dir are the folder names '.' and '..' is fragile at best and buggy at worst. You should remove those folder names explicitly using setdiff or ismember, e.g.:
S = dir(mypath);
C = {S([S.isdir]).name};
C = setdiff(C,{'.','..'});
Jacqueline Mifsud
on 4 Jun 2020
Hi Stephen,
Thanks for clarifying that - I admit I was wondering how natsortfiles would handle my case since the time directories don't have any extension, but I didn't realise that was causing the problem. It would be great to see this scenario added on to its capability though, as it's definitely a powerful tool!
I'm now trying to use idx to sort the contents of the cell MyData. I've tried something like this:
for ix = length(MyData)
MyData{ix} = MyData{ix}{idx}
end
but I really don't think that does the right thing.
I'm getting a bit confused as MyData does not contain any information with regards to the timeDirectory but simply the contents of the .csv files, so how can I sort its contents using idx? I've had a look at cellfun but it seems this will sort the actual 'contents of each cell of cell array MyData (one cell at a time)' which is not what I need.
Jacqueline Mifsud
on 4 Jun 2020
Ok, stupid question. Is it really as simple as:
sortedMyData = MyData(1,idx)
?
Stephen23
on 4 Jun 2020
Edited: Stephen23
on 4 Jun 2020
Yes, all you need is some indexing like what you showed in your last comment, no loops or cellfun is required.
More commonly the sort would be applied to the file/folder names before the loop, because then all of the processing and allocation of the imported data automatically occurs in the correct order, and everything matches up.
Applying the sort is also possible after the loop, but increases the risk of different arrays getting out of synch. For example, now your data and folder name arrays are probably in different orders. I recommend avoiding this approach.
Jacqueline Mifsud
on 4 Jun 2020
Thanks for the suggestions, I've now managed to get the code working.
I will try to improve it further by doing the sort before the loop through the files as you suggest - it definitely is more logical than doing the sort after the import of data.
Jacqueline Mifsud
on 5 Jun 2020
Edited: Jacqueline Mifsud
on 5 Jun 2020
Hi Stephen, sorry to be posting again but I'm somewhat stuck.
I want to separate the data in MyData based on whether its positive or negative.
This is how I do this:
Z1 = Z;
Z1( Z >=0 ) = 0; % matrix with negative values
Z2 = Z;
Z2( Z < 0 ) = 0; % matrix with positive values
So far so good. The problem occurs when I use Z1 and Z2 to plot two separate contourf plots on the same meshgrid in the same figure.
contourf(X,T,Z1);
hold on;
and
contourf(X,T,Z2)
hold off;
To give some context, this is because I'd like to compare the locations of the +ve and -ve values on the same graph. However, I am having trouble viewing both of these and can only see the last contourf I plotted.
I suspect it's because the non-zero entries in Z1 have been set to zero in Z2 and therefore this covers the first contourf.
Is there a better method to separate the positive and negative elements of each matrix, whilst ignoring the zero entries in Z1 and Z2?
Stephen23
on 5 Jun 2020
Edited: Stephen23
on 5 Jun 2020
Of course zero is also a perfectly valid value, so it will also get plotted, potentially convering up data underneath.
You could try using NaN instead of zero: NaN values are not plotted, so this is a common way to provide gaps in plot lines or similar. I don't know how it will work with countour, but it is worth a try.
Jacqueline Mifsud
on 5 Jun 2020
Edited: Jacqueline Mifsud
on 5 Jun 2020
I've tried this with NaN and although no values are plotted in the contourf plot for those matrix entries, I still cannot see both plots.
I think the problem might be something else, e.g. axis alignment. I'd like each contourf to have a separate colorbars and apparently each set of axes only supports one colorbar. So I've created two sets of axes since I want two color bars, and linked them with linkaxes - but it does not seem to be doing the trick.
Unfortunately I can't think of another workaround for this.
Stephen23
on 19 Jan 2022
Regarding this comment:
NATSORTFILES now supports a 'noext' option, which does not split the names at (any) final dot character.
Answers (0)
See Also
Categories
Find more on Data Import and Export in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!An Error Occurred
Unable to complete the action because of changes made to the page. Reload the page to see its updated state.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)