Import text with headers as column values and append to traditional data set

1 view (last 30 days)
I would like to import a large number of text files from a simulation program that all have a very similar pattern into one matrix or dataset.
I am trying to pull the numbers from line three into columns, with a column for each value.
(:, Column 1)=78%
(:, Column 2)=3000
(:, Column 3)=1
(:, Column 4)=9
(:, Column 5)=5
(:, Column 6)=300
The headings of each of the values does not change across file types just the numbers do.
Then columns and rows would take the rest of the table.
mat1=(:, [date biomass yield no3() sowing_date surfacep_wt AccumRainfall])

Accepted Answer

Stephen23
Stephen23 on 9 May 2015
Edited: Stephen23 on 11 May 2015
This code reads the complete sample file, and converts those values to numeric:
fid = fopen('test import.txt','rt');
data_title = textscan(fid, 'Title = %s%s%s%s%s%s', 'HeaderLInes',2, 'Delimiter',';');
data_header = textscan(fid, '%s%s%s%s%s%s%s%s', 2, 'MultipleDelimsAsOne',true);
data_values = textscan(fid, '%s%f%f%f%f%f%f%s', 'MultipleDelimsAsOne',true, 'CollectOutput',true);
fclose(fid);
[str,val] = cellfun(@(s)strtok(s,'='), [data_title{:}], 'UniformOutput',false);
idx = cellfun(@(s)strcmp(s(end),'%'),val); % identify percentages
val = cellfun(@(s)sscanf(s,'=%f'),val); % convert to numeric
And we can check these values in the command window:
>> str
str =
'Water' 'Matter' 'Residues' 'fert_amount_sow' [1x20 char] 'tillage_depth'
>> val
val =
78 3000 1 9 5 300
>> idx
idx =
1 0 0 0 0 0
and of course the date and numeric data:
>> data_values
data_values =
{120x1 cell} [120x6 double] {120x1 cell}

More Answers (1)

Walter Roberson
Walter Roberson on 9 May 2015
Use textscan() telling it to skip 6 lines, and use a format of '%s%g%g%g%g%g%g%s' and the CollectOutput option. You will get as output a cell array, the first entry of which is the dates in string format, the second is an N x 6 array of numbers, and the third is the string for AccumRainfall.
You can use datenum() with 'mm/dd/yyyy' format on the first cell array to get MATLAB numeric date.
The proper processing for the AcumRainfall string is not obvious. Should '?' be interpreted as 0, or do you want it to come out as NaN or as some other value? Your sample only shows '?' in that column, so I do not know if the field would say 'Y' if there was rainfall or if it would show a numeric amount. If it is a numeric amount, then you can use str2double() on the cell array of strings: that will convert all of the '?' entries into NaN values and will convert the numeric strings to numbers. The NaN can then be detected (if desired) by using isnan()
  2 Comments
Bus141
Bus141 on 9 May 2015
Edited: Bus141 on 9 May 2015
Thanks for the help. I am trying to get the values from line 2 as well. Since I am pulling from many texts files I need a way to differentiate the tables from each text file. These files are all differentiated by the various numbers in line 2. This is why I am trying to get the values from the files into columns then append them to the data below. Even if it is in a string format, that is fine but it is getting all of the differentiation across the top.
In reference to the AccumRainfall, that was a mistake in the file. It will be a number, as opposed to a question mark. Thanks again
Walter Roberson
Walter Roberson on 9 May 2015
You can textscan() with a string format such as '%s%s%s%s%s%s%s%s', a header skip of 1 line, and a count of 1 lines. That would leave you after line 2, so then you would do a textscan with a header skip of 4 lines and the format I gave about.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!