if loop within for loop for statistical analysis of data

1 view (last 30 days)
Hi,
I am having a code with data, that consists of a very large column vector in the form of:
P_b=[2;3;4;5;6;NaN;3;4;5;6;NaN;3;4;2;NaN;3;Nan];
For that vector, I would like to group all consecutive non-NaN values, i.e. [2;3;4;5;6],[3;4;5;6] etc. fit a normal distribution to them, extract the mean, and have the result come up in a vector. This vector includes all the means of the 'grouped' data of P_b.
May sound kind of complicated but it shouldn't be. I have created the code below, however an odd problem that arrises is that MATLAB does not recognise the variable 'avg', when at the end of the for-loop, I am trying to save all for-loop results in a vector. However when I run the code without that last line, it seems to recognise the variable 'avg'. Any ideas? Thanks in advance for your help. Below is the code.
P_pdf=[];
%Inices with NaN
idxnan=find(isnan(P_b));
for i=1:size(idxnan,1)-1
%Indices of numeric values
idxlow=idxnan(i)+1;
idxup=idxnan(i+1)-1;
%Group P_b Matrices according to NaN values
P_mat=P_b(idxlow:idxup);
%Reject empty matrices and treat singular values
if size(P_mat)==[1,1];
avg=P_mat;
elseif size(P_mat)==[0,0];
avg=NaN;
%Create distribution fit
pdf=fitdist(P_mat,'Normal');
avg=pdf.mu;
end
P_pdf=[P_pdf;avg];
end

Accepted Answer

Stephen23
Stephen23 on 21 Jan 2017
Edited: Stephen23 on 21 Jan 2017
This is a classic example of how badly formatted code makes buggy code. When the code is formatted using MATLAB's default formatting rules (select all, ctrl+i), then the cause is much easier to spot:
P_pdf = [];
%Inices with NaN
idxnan = find(isnan(P_b));
for i = 1:size(idxnan, 1) - 1
%Indices of numeric values
idxlow = idxnan(i) + 1;
idxup = idxnan(i + 1) - 1;
%Group P_b Matrices according to NaN values
P_mat = P_b(idxlow:idxup);
%
%Reject empty matrices and treat singular values
if size(P_mat) == [1, 1];
avg = P_mat;
elseif size(P_mat) == [0, 0];
avg = NaN;
%Create distribution fit
pdf = fitdist(P_mat, 'Normal');
avg = pdf.mu;
end
P_pdf = [P_pdf; avg];
end
Now it is clear that there is an if and an elseif, but if neither of these conditions have been fulfilled then there is no else and so avg never gets defined. The error is due to testing the matrix size like this:
size(P_mat) == [0, 0]
which is not every going to be true when P_mat is created by indexing like this:
P_mat = P_b(idxlow:idxup);
Try it yourself at home:
>> V = 1:3;
>> size(V(2:1))
ans =
1 0
So that test ==[0, 0] will always fail. The logic is bad anyway: surely you want to test for non-empty vectors and apply the fit to them?
Here is a slightly more robust version of your loop:
P_b = [2;3;4;5;6;NaN;3;4;5;6;NaN;3;4;2;NaN;3;NaN];
idn = isnan(P_b);
idd = diff(idn);
idb = find([~idn(1);idd<0])
ide = find([idd>0;~idn(end)])
out = NaN(size(idb));
for k = 1:numel(idb)
tmp = P_b(idb(k):ide(k));
pdf = fitdist(tmp,'Normal'); % untested, I don't have fitdist
out(k) = pdf.mu; % untested
end
Personally I would not write all of that code: I would simply split the input vector using accumarray, and then use cellfun to do whatever processing:
P_b = [2;3;4;5;6;NaN;3;4;5;6;NaN;3;4;2;NaN;3;NaN];
idx = isnan(P_b);
idy = cumsum([1;diff(idx)>0]);
C = accumarray(idy(~idx),P_b(~idx),[],@(n){n});
D = cellfun(@(v)fitdist(v,'Normal'),C); % untested: I don't have fitdist
P_pdf = arrayfun(@(s)s.mu,D) % untested
It might be required to get cellfun to return a cell array:
D = cellfun(@(v)fitdist(v,'Normal'),C,'Uni',0); % untested
P_pdf = cellfun(@(s)s.mu,D) % untested
  2 Comments
Kosta
Kosta on 21 Jan 2017
Thanks, this stupid mistake I made does indeed solve part of the problem. However I still can't get this to work. The P_mat does not seem to be treated every time by the if statement for some reason, resulting to a blank P_mat.
Stephen23
Stephen23 on 21 Jan 2017
Edited: Stephen23 on 21 Jan 2017
Check how large the selection is like this:
P_b = [2;3;4;5;6;NaN;3;4;5;6;NaN;3;4;2;NaN;3;NaN];
idn = isnan(P_b);
idd = diff(idn);
idb = find([~idn(1);idd<0])
ide = find([idd>0;~idn(end)])
out = NaN(size(idb));
for k = 1:numel(idb)
tmp = P_b(idb(k):ide(k));
if isempty(tmp)
out(k) = NaN;
elseif isscalar(tmp)
out(k) = tmp;
else
pdf = fitdist(tmp,'Normal');
out(k) = pdf.mu;
end
end

Sign in to comment.

More Answers (1)

Kosta
Kosta on 21 Jan 2017
Got this whole thing working like this finally. Thanks again for your help:
P_pdf=[];
%Inices with NaN
idxnan=find(isnan(P_b));
for i=1:size(idxnan,1)-1
%Indices of numeric values
idxlow=idxnan(i)+1;
idxup=idxnan(i+1)-1;
%Group Power Matrices according to NaN values
P_mat=P_b(idxlow:idxup);
%Reject empty matrices and treat singular values
if size(P_mat)==[1,1];
avg=P_mat;
elseif size(P_mat)==size(zeros(0,1));
avg=NaN;
else
%Create distribution fit
pdf=fitdist(P_mat,'Normal');
avg=pdf.mu;
end
P_pdf=[P_pdf;P_mat];
end
  1 Comment
Stephen23
Stephen23 on 21 Jan 2017
Edited: Stephen23 on 21 Jan 2017
Note that this code is not robust (e.g. it cannot cope with sequential NaN), nor efficient due to the concatenation inside the loop. In particular this is very poor code:
size(P_mat)==size(zeros(0,1))
Hard to read, hard to comprehend, and pointlessly complicated. See my answer and comments for much simpler code.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!