What is maximum size of line that can be read by fgetl/fgets?
6 views (last 30 days)
Show older comments
Hello everyone,
I am using array of structures to read a text file that contains many lines of variable lengths. I am getting an error when MATLAB reads the longest line. The error is
Error using fgets Invalid file identifier. Use fopen to generate a valid file identifier.
Error in fgetl (line 33) [tline,lt] = fgets(fid);
Error in ReadRefGPSData (line 18) ASCII(i).tline= fgetl(fid);
Where i in ASCII(i).tline= fgetl(fid); is the index or line number of the longest line.
Can anyone tell me is there any maximum limit of fgetl/fgets(as inside the fgetl MATLAB function fgets is used)? If yes, what is the otherway around, to read a long line in two halves. I cannot break line in two or more shorter lines, as it is generated by system.
Thanks in advance.
2 Comments
Answers (3)
KSSV
on 1 Jul 2016
There is no limit as such to read the lines according to my experience. Have you tried opening the file using fopen?
fid = fopen('yourfile.txt','r') ;
tline = fgetl(fid) ;
Steven Lord
on 1 Jul 2016
That error isn't related to the length of the lines in the file; MATLAB didn't get that far. The file was not opened successfully. Call fopen with two output arguments. Before using the first of those output arguments, the file identifier, check if it is -1. If it is that means fopen was not able to open it successfully and the second output from the fopen call will be a message that hopefully provides an explanation as to why it wasn't able to open the file.
Walter Roberson
on 2 Jul 2016
My tests indicate that in R2016a on OS-X, fgetl() can read lines whose lengths exceed 6/10 gigabytes. I expect that it would be able to handle lines up to the point where it runs out of memory or hits the limit you have configured on array size.
However, beyond about 1/4 gigabytes it gets pretty slow; I have had it working on reading 1.2 gigabytes for several minutes now, and the time it is taking leaves me wondering whether the internals are growing the array dynamically with lots of copying (rather than, for example, scanning ahead to determine how far away the line terminator is, allocating that much memory, and then going back to read in the line -- an algorithm you can only use on block structured inputs that you can seek backwards in.)
1 Comment
Walter Roberson
on 2 Jul 2016
Okay, so I have checked, and besides the obvious limit of "amount of memory available", there is a more subtle limit.
fgetl() is implemented by calling fgets(), similar to
[data, line_terminator] = fgets(FileID);
On my system, that worked fine on a line of total length 1207959544, producing a data variable with twice that many bytes (because each char at the MATLAB level is 2 bytes.)
Then fgetl does the equivalent of
data = data(1:end-length(line_terminator))
and that was hitting the limit I had configured in preferences for maximum array size.
I was puzzled as to why it was complaining about needing a 9 Gb array to do the above operation, when the output should be no more than 1207959544*2 bytes, about 2 1/4 Gb. I was thinking that perhaps it was failing to take into account that char are smaller than double and was over-counting the space requirement.
But then I realized that the 9 Gb was being required to generate the elements in the input vector, 1:end-length(line_terminator) which are being generated as double
So that is the more subtle limit: your lines are limited to the number of characters equal to 1/8th of your configured largest array.
However, you will not get an error about invalid file ID if you encounter this: you will get an error about array being too large.
See Also
Categories
Find more on Low-Level File I/O in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!