Read string from files since R2020a

19 views (last 30 days)
Radek
Radek on 20 Apr 2020
Commented: Radek on 22 Apr 2020
I have a large binary data file with some ASCII formatted metadata header at the beginning. To read this header, I use 'string-oriented' read functions like fgetl(~), fscanf(~, '%s', ~) or fread(~, ~, '*char'). In Matlab versions prior to R2020a (I have R2014b and R2019b) this worked just fine, however in the R2020a something changed.
Now the very first, but only the first, attempt to read any string from the file will use extensive amount of memory and freezes the whole thing. I have a guess that Matlab is trying to read the whole file into memory. And in my case the file itself is larger than available RAM which probably cause the freezing.
Here what I do:
% Here everything works just fine
fd = fopen('file.name', 'r');
arr1 = fread(fd, 1);
arr2 = fread(fd, 1);
fclose(fd);
% Here I have a problem
fd = fopen('file.name', 'r');
arr1 = fread(fd, 1); % fast and smooth
arr2 = fread(fd, 1, '*char'); % uses extensive amount of RAM and slow
arr3 = fread(fd, 1, '*char'); % fast and smooth again
fclose(fd);
1) It does not matter what part of the file I read.
2) All numeric type returning read functions are always fast.
3) The first string returning read function is always slow and does not matter what function I use (as long as it returns string).
4) All successive string reads are as fast as numeric ones.
5) Once the read function returns the string the memory is released.
6) File position pointer is always at expected position (does not move to end of the file).
7) It does not matter if the file is opened in text or binary mode.
8) The issue is presented both on Windows and Linux.
Any idea?

Accepted Answer

Sindar
Sindar on 22 Apr 2020
From the release notes:
"As of R2020a, character-oriented file I/O functions such as fscanf, fgets, and fgetl trigger automatic character set detection when reading a file that was opened using fopen without a specified encoding."
My suspicion then is that the "automatic character set detection" may require looking through the full file.
Try specifying the encoding in fopen, e.g.,
fd = fopen('file.name', 'r','n','UTF-8');
  2 Comments
Radek
Radek on 22 Apr 2020
I can conform that specifying encoding solves the issue, thank you Sindar.
Next time, I should check release notes more carefully.

Sign in to comment.

More Answers (0)

Products


Release

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!