How to extract data from .txt file that has both text and numerical data?

15 views (last 30 days)
I have data that I want to extract from multiple txt files. Each txt file is organized in the same way, as seen in the image attached below. I am not interested in the text or headers. I only want to pull out the numerical data in the three columns labeled "A-FRQ", "FRQ-C", and "P-FRQ".
To clarify specifically what I want to do with the data: I have hundreds of these .txt files in a folder. I want to run a loop that combines the columns of data for all of these .txt files. So by the end of the process, I want three huge arrays in MATLAB (A-FRQ, FRQ-C, and P-FRQ) that contain those respective values for EVERY data file in the folder. Note, that the size of each individual data .txt file is different. In the picture below, that dataset has 12 rows I'm interested in extracting. In another file, it might be 75 rows of data I'm interested in.
My end goal is three separate histograms for the categories (A-FRQ, FRQ-C, P-FRQ) that account for the data in all the files.
Sorry, I guess I don't really have a specific question since I'm just about to dive into this task right now. I just figured I'd post my problem ahead of time in hope someone can provide tips or advice on how I can efficiently accomplish this conceptually simple task. I can already imagine that my biggest issue will be how to read the .txt file in MATLAB and then extract the columns of data that I am interested in without caring about the headers/text in the file.
  1 Comment
jgg
jgg on 17 Dec 2015
Hopefully your files all look pretty similar, because this will make things a lot easier. You will probably want the textscan command to do this.
I would start by getting one of your files to textscan in properly, then automate the loop, probably using the fullfile and dir commands to automate looping through your files.

Sign in to comment.

Answers (2)

Ingrid
Ingrid on 18 Dec 2015
this is not so difficult to achieve
just define the appropriate HeaderLines as optional argument to textscan and it should work fine if you do something like this
listing = dir(nameFolder);
N = numel(listing);
data = [];
for ii = 3:N
fid = fopen(listing{ii}));
newData = textscan(fid,'%*f%f%f%f,'HeaderLines",7);
data = [data; newData];
fclose(fid);
end
  2 Comments
Guillaume
Guillaume on 18 Dec 2015
The only thing I would change from Ingrid's answer is the allocation / resizing of data on each file. Instead I'd store each newData matrix into a cell array, and do the concatenation in one go at the end.
This should both be more memory and time efficient:
listing = dir(nameFolder);
N = numel(listing);
data = cell(1, N);
for ii = 3:N
fid = fopen(listing{ii}));
data{ii} = textscan(fid,'%*f%f%f%f,'HeaderLines",7); %read into cell of cell array
fclose(fid);
end
data = vertcat(data{:}); %concatenate all the cells. Only one reallocation instead of hundreds
Ingrid
Ingrid on 21 Dec 2015
thanks Guillaume for this useful tip. I knew it was going to change the size of the matrix each run but did not know how to solve it when you do not know the size of the matrix beforehand. This is a simple but effective solution.

Sign in to comment.


D. Ali
D. Ali on 27 Apr 2019
I have similar question where I need to extarct all MCAP evetns with time they occured on in separat file and plot if possilbe
I attached the file

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!