Ignoring header/footer in textfile question

3 views (last 30 days)
m j
m j on 29 Jan 2020
Commented: Walter Roberson on 29 Jan 2020
Hello,
For the past week Ive been trying to open multiple text files that have different headers/footers at the same time. And ignoring all headers/footers and just extracting the data.Without knowing what the headers/footers are.The only thing I know is that the headers/footers always start with a char and form a string.
All headers/footers start with a char, examples:
File 1:
Line 1 of file - Samplerate : 100000
Line 2 of file - Bitspersample: 12
Rest of lines - data(2000 samples,floats)
File 2:
Line 1 of file - Bitspersample: 32
Line 2 of file - Normalized: FALSE
Lines 3-2500 - data(2500 samples,floats)
Line 2501 of file - Channel: A
Is there a way to ignore all lines of a text file that start with a char/string?

Answers (1)

Walter Roberson
Walter Roberson on 29 Jan 2020
fileread() the file.
regexprep() pattern '^\s*[^0-9+.-].*$' replacement '' (the empty string) with 'lineanchors' option. This will zap the content of lines whose first non-whitespace character is not a digit or + or - or period. If your data never has leading + on the numbers then do not include the + in the pattern. If your data never has numbers that start with period without leading 0 then do not include period in the pattern. This is the question of whether a number like .5 can occur or if would be 0.5.
In the case where your data never has leading + or - or period then instead of the pattern I showed, you can use '\s*\D.*$'
After the regexprep, textscan() the string.
  2 Comments
m j
m j on 29 Jan 2020
Edited: m j on 29 Jan 2020
Thanks, but im having a hardtime deciphering regexprep. I can see why you have a ^ and $ for 'lineanchors' option. But im having trouble finding out how you get the rest inbetween it....^\s*[^0-9+.-].*$' . Theres a explanation for \s but I cant seem to find explanation for rest,ie *[^0-9+.-].*.....
Also newStr = regexprep(str,expression,replace). As I understand str would be from fileread,and expression would be '^\s*[^0-9+.-].*$', and replace is 'lineanchors'? Am I understanding this correctly,sorry if not. English isnt my first laungauge. Plus im not at my PC,when I get home ill try.
And my data will have both + and - floats that always have a leading 0. example: -0.012,0.44,5.44,etc...
Walter Roberson
Walter Roberson on 29 Jan 2020
regexprep(str, '^\s*[^0-9+-].*$', '', 'lineanchors', 'dotexceptnewline')
[] means aany one character chosen from the list inside of the [] except when the first thing inside the [] is ^ in which case it means any one character that is NOT one of the listed ones. So the construct matches any one character that is NOT 0123456789 or + or - . In short you are looking for lines in which the first nonblank character is something that cannot possibly be forming a number.
The .* after that with the dotexceptnewline option matches to the end of the same line. When you find such a line you replace it with emptiness (but without removing the newline character itself) so you get an empty line in place of any line that starts with a non-number

Sign in to comment.

Categories

Find more on Large Files and Big Data in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!