how can I match 2 columns, which one is the guide column with all daily data from 2000-2010 and the other has some missing values among its rows, then I need to put NAN for the missing values?

Question

Matlab1 on 27 Nov 2014

0
Link

Direct link to this question

https://au.mathworks.com/matlabcentral/answers/164451-how-can-i-match-2-columns-which-one-is-the-guide-column-with-all-daily-data-from-2000-2010-and-the

Commented: Star Strider on 29 Nov 2014

2 Comments
Show NoneHide None

Geoff Hayes on 27 Nov 2014

Mina - rather than posting the same text in your question and title, why not use the body to provide a small sample data set and describe how you want to match the data between the two columns. Does the second column have fewer rows than the first indicating that there is missing data in that column? Is there a date attached to the second column so that we can easily figure what rows/dates are missing?

Matlab1 on 27 Nov 2014

I have a file of millions data, some missing values inside the rows. I need to find out the place of the missing values and put NAN. The attachment is a sample of what I mean exactly.

Thanks

Sign in to comment.

Sign in to answer this question.

Answer 1

Star Strider on 27 Nov 2014

0
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/164451-how-can-i-match-2-columns-which-one-is-the-guide-column-with-all-daily-data-from-2000-2010-and-the#answer_160378

You may not need to put NaN for the missing values. You did not post your data or a detailed description of what you want to do, so you left us guessing. See if interp1 will do what you want.

2 Comments
Show NoneHide None

Star Strider on 27 Nov 2014

What you do with your data is your decision. You can put NaN values if you wish, or you can do an interpolation with interp1. Your data are your data, and you have to decide what you want to do with them.

Star Strider on 29 Nov 2014

Open in MATLAB Online

This turned out to be very much less trivial than I envisioned it. I created a separate data file with multiple gaps of multiple hours to test the robustness of my final code, and to my observation this works. The indexing involves first finding the gaps, then starting at the end of the matrix and going in reverse order, filling in the gaps as necessary, preserving the remaining indexes in the process. (There must be more efficient ways to do this, but if so, I wasn’t able to discover them.)

The code:

[d,s,r] = xlsread('sample1.xlsx');
dtn = datenum(s,'mm/dd/yyyy')+d(:,1);
dtnv = [min(dtn):1/24:max(dtn)]';
datnan = nan(size(dtnv));
ddtn = [1/24; diff(dtn)];
mddtn = min(ddtn);
gap = [find(ddtn >  mddtn*1.1)];
gap = [gap(1:end)  round(ddtn(gap(1:end))/mddtn)];
gap = [[1 0]; gap; length(datnan) 0];
val = [];
for k1 = length(gap):-1:2
    val = [val; [gap(k1,1)+gap(k1,2)-2:-1:gap(k1,1)]'];
end
val = setdiff(1:length(datnan),val);
datnan(val) = d(:,2);
outmat = [dtnv  datnan];

To briefly describe it, ‘dtn’ reads and converts the date strings into date numbers, then ‘dtnv’ creates a continuous hourly series between the minimum and maximum of ‘dtn’, and a matching ‘datnan’ vector of NaN values. The ‘gap’ variable looks for differences in the ‘ddtn’ variable that are greater than 1/24 day, and creates a second column of the number of hours that are in the gap. In the loop, the ‘val’ vector creates a series of indices into ‘datnan’ that correspond to the data in ‘d(:,2)’, the data corresponding to the hours read in by the xlsread call. It then uses those indices to map the values of ‘d(:,2)’ to the appropriate rows of ‘datnan’, leaving NaN values in the gaps.

The ‘outmat’ matrix is the output, with the first column being date numbers from ‘dtnv’ and the second column being ‘datnan’. You can always convert the date numbers to other date formats as you wish, but for the purposes of this routine, date numbers are more convenient.

You already Accepted my Answer (thank you) so I felt I owed you my best effort to solve your problem.

Sign in to comment.

Answer 2

Matlab1 on 27 Nov 2014

0
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/164451-how-can-i-match-2-columns-which-one-is-the-guide-column-with-all-daily-data-from-2000-2010-and-the#answer_160379

sample.xlsx

thanks again.

Here is the sample of my file. As it is clear, between the row 3 and 4 there is a missing value. and as this is a small data it is easy to find the missing values, but if we have a million values, how can I find out the missing values and i need to put NAN instead of them.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

how can I match 2 columns, which one is the guide column with all daily data from 2000-2010 and the other has some missing values among its rows, then I need to put NAN for the missing values?

2 Comments
Show NoneHide None

Accepted Answer

2 Comments
Show NoneHide None

More Answers (1)

0 Comments
Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Community Treasure Hunt

how can I match 2 columns, which one is the guide column with all daily data from 2000-2010 and the other has some missing values among its rows, then I need to put NAN for the missing values?

2 Comments Show NoneHide None

Accepted Answer

2 Comments Show NoneHide None

More Answers (1)

0 Comments Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Community Treasure Hunt

2 Comments
Show NoneHide None

2 Comments
Show NoneHide None

0 Comments
Show -2 older commentsHide -2 older comments