Importing data from a .prc (unrecognised) filetype using variable names
4 views (last 30 days)
Show older comments
Hi there, I am trying to import a selection of data from many files of the same type (.prc), but I would only like to import the data under select heading names from these files.
I have used the import tool to generate a script to start this off, my problem is that this assumes a fixed number of headings/variables for each file that I then use this on. I am iterating through many of these files, some of them may have the variables that I want as defined in "Var Names", and some will not. When I run this in a loop which goes through the folders, some of the outputs may be correct which match the parameters, but if the number of params in file 'i' does not match what is specified here. I end up with data that is shifted left/right in in thet component of my final structure, so not in the right place/heading and not reliable to use.
opts = delimitedTextImportOptions("NumVariables", 70); % Number of headings, specified fixed
% Specify range and delimiter
opts.DataLines = [305, Inf]; % Jump to Data after [Data+Units]
opts.Delimiter = "\t";
% Specify column names and types
opts.VariableNames = ["Var Names"];
opts.VariableTypes = ["Data Type (all double)"];
% Specify file level properties
opts.ExtraColumnsRule = "ignore";
opts.EmptyLineRule = "read";
opts.ConsecutiveDelimitersRule = "join";
% Specify variable properties
opts = setvaropts(opts, ["Var Name Opts"]);
opts = setvaropts(opts, ["Var Name Opts"], "EmptyFieldRule", "auto");
opts = setvaropts(opts, "Var Name Opts", "ThousandsSeparator", ",");
% Import the data
data.(folderlist(i))=readtable(['Data Folder Path',folderlist(i).name], opts);
%% Clear temporary variables
clear opts
end
I would preferably like to be able to use the variable names I have specifed to search for exact matches, but I'm not sure how to go about it with this setup/file type. The variables/headers that I want exist in each of the files, just often in different column locations.
I would like to try to avoid changing everything over to a known filetype (e.g. excel) and then using a srtcmp to find the right headers, but I am not sure of a better way to do this that could use less time/resource.
If anyone familiar with the import tool is able to help, then that would e great!
Thanks,
C.
0 Comments
Accepted Answer
dpb
on 8 Oct 2024
Moved: dpb
on 8 Oct 2024
Attach at least two files that illustrate the issue -- they don't have to be more than a few lines each.
But, in general, you should be able to write a generic import object; I'd think about all you should need would be
nHdrL=304;
opts=detectImportOptions('AnExampleFile.prc','FileType','delimitedtext','Delimiter','\t' ...
'NumHeaderLines',nHdrL,'Range',[nHdrL+1,1],'ReadVariableNames',1);
The above should give you a good import options object from which to start with some customization to generalize for purpose...
VNAMES={'Date','X', ...,'Fred'}; % the list of desired variables
opts.SelectedVariableNames=VNAMES; % tell it which ones you want
opts.VariableTypes=repmat({'double'},1,numel(VNAMES)); % and the data type
The key is since you don't know what is in the file as to the total number of variables, then telling it a specific number to read will be lying for any file without that specific number. The above will try to find the variables you've listed.
You may need to futz with the 'VariableNamesLine' parameter if it isn't the one line before the data; the default location may not match what you wish; this will particularly be true if there is some other text in the file or somesuch.
Then, save this import object as a .mat file with a nice name and read it into memory and use it inside the loop over the files...
0 Comments
More Answers (0)
See Also
Categories
Find more on Data Import and Analysis in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!