if it is greater than 75 percent missing data
    4 views (last 30 days)
  
       Show older comments
    
    Fabian Moreno
 on 26 Oct 2020
  
    
    
    
    
    Answered: Peter Perkins
    
 on 19 Nov 2020
            Hello there
I have a matrix of (10200x7) from 01-01-2019 to 29-02-2020, which is iqual to the hourly temperature measured. The data is organizaed in the first 6 columns with date information (yy-mm-dd-hh-mm-ss) and then my temperature data (column 7). I already got the nanmean dayli values for a new matrix. But I would like to know how can I get a conditional, for example:  if more than 75% of my data each 24 hours is missing, give me NaN for that day. If not, give me nanmean value.
I'm trying something like this, but I'm pretty sure I'm on the wrong way
%first I find the hourly values positions after 0 hs and before 23 hs
x = find(files_mat{1,1}(:,5)>= 0 {1,1}(:,5)<=23);
%%then, I'm trying to get the conditional to the tempearture column
if isnan (files_mat{1,1}(x,7))<= 0.75
      M_24h
 elseif isnan (files_mat{1,1}(x,7))>= 0.75
     M_24h
 Thank you
2 Comments
  Rik
      
      
 on 26 Oct 2020
				I don't fully understand the shape of your data. But if you have an array, you can call isnan and get out a logical array. Using mean you can find the percentage of NaNs in each row, and using the comparator you will end up with a logical vector you can use to index into your output array and correct all non-NaNs.
I'm not posting this as an answer, as I can't really write a working example for you without understanding your data.
Accepted Answer
  Peter Perkins
    
 on 19 Nov 2020
        I think this would be much simpler if you created a timetable by combining the first 6 vars in the table. Actually, I have no idea if you even have a table, but you should. Use readtable to read in your data.
Given a timetable, this is a one-liner: use retime to aggregate to daily means, except use your own custom function that returns NaN if there's more than 18 missing hourly values. Something like this:
ttDaily = retime(ttHourly,'daily',myFun)
function y = myFun(x)
if sum(isnan(x)) > .75*length(x)
    y = NaN;
else
    y = mean(x,'omitnan');
end
0 Comments
More Answers (1)
  Adam Danz
    
      
 on 26 Oct 2020
        
      Edited: Adam Danz
    
      
 on 27 Oct 2020
  
      Here's an approach you can apply to your data, 
% Create a table of year-month-day-hour-min-second-rain data with some of the hour data missing (NaN)
dateTimeVals = datetime(1999,1,1) + hours(1:5000)'; 
rain = rand(size(dateTimeVals)); 
T = array2table([datevec(dateTimeVals), rain], 'VariableNames',{'Y','M','D','H','Mn','S','rain'}); 
% Replace some Hour data with missing vals
rng('default')
T.H(randi(height(T),1,3000)) = NaN; 
% Show top of table
head(T) 
% Add column to identify valid hour-data
T.dateValid = ~isnan(T.H); 
head(T) 
% Compute the percentage of valid hour-data per day
% This is achieved by applying mean() to the binary 
% 'dateValid' column for each day.
Tgs = groupsummary(T,{'Y','M','D'},'mean', 'dateValid');
head(Tgs)
% This shows that for the first day of the first month of 1999, 
% 47.8% of hour-data are non-nan. Let's confirm that manually. 
daySelection = T.Y==1999 & T.M==1 & T.D==1;  
mean(T.dateValid(daySelection),'omitnan') * 100
% To identify which days have >75% hour-data, 
Tgs(Tgs.mean_dateValid > 0.75, 1:3)
0 Comments
See Also
Categories
				Find more on Logical in Help Center and File Exchange
			
	Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!


