Reading a variable that have different names in different data files

assuming I am using the following lines to read some data files that have a table structure
for k = 1 : nfiles
fullFileName = fullnames{k};
%reading the file as a table
opts = detectImportOptions(fullFileName);
getvaropts(opts,{'YEAR','MONTH','DAY','HOUR','MIN','GDALT','NE8'});
opts.SelectedVariableNames = {'YEAR','MONTH','DAY','HOUR','MIN','GDALT','NE8'};
t = readtable(fullFileName,opts);
end
later on I have discovred that some files do not have the variable NE8 or GDALT, or they have them under different names, how can I exclude the files that does not have these variables and not read them and how can I read them from different files under differnet names ?

2 Comments

detectImportOptions and then examine the variable names returned in the options structure, to see whether it has the variables you need
getvaropts(opts,{'YEAR','MONTH','DAY','HOUR','MIN','GDALT','NE8'});
That line is not doing anything useful for you -- not unless you remove the semi-colon so that you can display the output.
This is the error I am getting
Error using matlab.io.ImportOptions/getNumericSelection (line 518)
Unknown variable name: 'POP'.
Error in matlab.io.ImportOptions/set.SelectedVariableNames (line 178)
rhs = getNumericSelection(obj,rhs);
Error in extractingtables (line 30)
opts.SelectedVariableNames = {'YEAR','MONTH','DAY','HOUR','MIN','GDALT','NE8','POP'};
where in one of the files we have such variable as seen in the inage below:
There are files that contain the NE8 variables , files that contain it under the name POP and files that does not have it at all. I am getting errors for both the second case and third case since not all the files have 'POP' and sone files does not have 'NE8'

Sign in to comment.

 Accepted Answer

The assumption below is that if GDALT is missing then you do not want the file, but that if NE8 is missing then you can use POP interchangably, and that if both NE8 and POP are missing then you do not want the file.
needed_vars = {'YEAR','MONTH','DAY','HOUR','MIN','GDALT'};
wanted_one_of_vars = {'NE8', 'POP'};
opts = detectImportOptions(fullFileName);
vars = opts.VariableNames;
if all(ismember(needed_vars, vars)) && any(ismember(wanted_one_of_vars, vars))
if ismember(wanted_one_of_vars{1}, opts)
NE8_varname = wanted_one_of_vars{1};
else
NE8_varname = wanted_one_of_vars{2};
end
else
continue; %or whatever you need to do for files that do not have all the variables
end
opts.SelectedVariableNames = [needed_vars, {NE8_varname}];
t = readtable(fullFileName,opts);

3 Comments

I am getting the following error :
Error using ismember>ismemberR2012a (line 186)
Input A of class char and input B of class matlab.io.text.DelimitedTextImportOptions must be the same class, unless one is double.
Error in ismember (line 95)
lia = ismemberR2012a(A,B);
Error in extractingtables (line 38)
if ismember(wanted_one_of_vars{1}, opts)
needed_vars = {'YEAR','MONTH','DAY','HOUR','MIN','GDALT'};
wanted_one_of_vars = {'NE8', 'POP'};
opts = detectImportOptions(fullFileName);
vars = opts.VariableNames;
if all(ismember(needed_vars, vars)) && any(ismember(wanted_one_of_vars, vars))
if ismember(wanted_one_of_vars{1}, vars)
NE8_varname = wanted_one_of_vars{1};
else
NE8_varname = wanted_one_of_vars{2};
end
else
continue; %or whatever you need to do for files that do not have all the variables
end
opts.SelectedVariableNames = [needed_vars, {NE8_varname}];
t = readtable(fullFileName,opts);

Sign in to comment.

More Answers (1)

Have you tried this ways of reading data, e.g.:
for k = 1 : nfiles
fullFileName = fullnames{k};
%reading the file as a table
opts = detectImportOptions(fullFileName);
t{k} = readtable(fullFileName,opts);
end
That results in k number of tables residing in a cell array variable called t that can be separated via another step.

3 Comments

Thank you for the quick reply. but this will read the whole file for me, isn't it? because I need only specific variables to be read from a file, (there are 18 variables at least in each files and I need anly 7 variables) and reading a whole file would not be effecient when taknig into account that we have more 11k files. correct me if I am wrong.
Yes, indeed what you are saying is correct. Maybe in this case, you had better import just part of the data by indexes and ignore the rest without using the variable names since they are not consistent for all data files.
yes I thought of that as well but the issue is the columns of the variables 'GDALT'and 'NE8' are having different places in different files, at one file they may be the 6th and 8th column at another they may be the 10th and 15th, that is why I am using there names in the first place, but now NE8 appear in a deffirent name as well in some files.

Sign in to comment.

Products

Release

R2021b

Asked:

MA
on 2 Oct 2021

Commented:

MA
on 12 Oct 2021

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!