how can I match 2 columns, which one is the guide column with all daily data from 2000-2010 and the other has some missing values among its rows, then I need to put NAN for the missing values?
1 view (last 30 days)
Show older comments
how can I match 2 columns, which one is the guide column with all daily data from 2000-2010 and the other has some missing values among its rows, then I need to put NAN for the missing values?
2 Comments
Geoff Hayes
on 27 Nov 2014
Mina - rather than posting the same text in your question and title, why not use the body to provide a small sample data set and describe how you want to match the data between the two columns. Does the second column have fewer rows than the first indicating that there is missing data in that column? Is there a date attached to the second column so that we can easily figure what rows/dates are missing?
Accepted Answer
Star Strider
on 27 Nov 2014
You may not need to put NaN for the missing values. You did not post your data or a detailed description of what you want to do, so you left us guessing. See if interp1 will do what you want.
2 Comments
Star Strider
on 27 Nov 2014
What you do with your data is your decision. You can put NaN values if you wish, or you can do an interpolation with interp1. Your data are your data, and you have to decide what you want to do with them.
Star Strider
on 29 Nov 2014
This turned out to be very much less trivial than I envisioned it. I created a separate data file with multiple gaps of multiple hours to test the robustness of my final code, and to my observation this works. The indexing involves first finding the gaps, then starting at the end of the matrix and going in reverse order, filling in the gaps as necessary, preserving the remaining indexes in the process. (There must be more efficient ways to do this, but if so, I wasn’t able to discover them.)
The code:
[d,s,r] = xlsread('sample1.xlsx');
dtn = datenum(s,'mm/dd/yyyy')+d(:,1);
dtnv = [min(dtn):1/24:max(dtn)]';
datnan = nan(size(dtnv));
ddtn = [1/24; diff(dtn)];
mddtn = min(ddtn);
gap = [find(ddtn > mddtn*1.1)];
gap = [gap(1:end) round(ddtn(gap(1:end))/mddtn)];
gap = [[1 0]; gap; length(datnan) 0];
val = [];
for k1 = length(gap):-1:2
val = [val; [gap(k1,1)+gap(k1,2)-2:-1:gap(k1,1)]'];
end
val = setdiff(1:length(datnan),val);
datnan(val) = d(:,2);
outmat = [dtnv datnan];
To briefly describe it, ‘dtn’ reads and converts the date strings into date numbers, then ‘dtnv’ creates a continuous hourly series between the minimum and maximum of ‘dtn’, and a matching ‘datnan’ vector of NaN values. The ‘gap’ variable looks for differences in the ‘ddtn’ variable that are greater than 1/24 day, and creates a second column of the number of hours that are in the gap. In the loop, the ‘val’ vector creates a series of indices into ‘datnan’ that correspond to the data in ‘d(:,2)’, the data corresponding to the hours read in by the xlsread call. It then uses those indices to map the values of ‘d(:,2)’ to the appropriate rows of ‘datnan’, leaving NaN values in the gaps.
The ‘outmat’ matrix is the output, with the first column being date numbers from ‘dtnv’ and the second column being ‘datnan’. You can always convert the date numbers to other date formats as you wish, but for the purposes of this routine, date numbers are more convenient.
You already Accepted my Answer (thank you) so I felt I owed you my best effort to solve your problem.
More Answers (1)
See Also
Categories
Find more on Language Fundamentals in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!