How do I use histcounts with overlapping bins?

Question

Prodip Das on 27 Mar 2019

1
Link

Direct link to this question

https://au.mathworks.com/matlabcentral/answers/452936-how-do-i-use-histcounts-with-overlapping-bins

Commented: Steven Lord on 29 Mar 2019

First off, there's only this post I found with some relevant inputs, although the comments suggested overlapping bins may not work with histcounts?

My question is this: Is there a way to create bin egdes by giving the number of bins (which histcounts does) and the percentage overlap between bins to generate a set of overlapping bins which can be used with accumarray later on?

More specifically, I have vectors x, y and z covering a spatial volume. I need to "discretize" this volume and bin the vector V.. (which is when I found the answer on 3D binning). I am looking for a way to extend this by adding overlapping bins.

Is there a way to achieve this? Any help is appreciated. Thanks!

4 Comments
Show 2 older commentsHide 2 older comments

Prodip Das on 29 Mar 2019

The main use in this case is to generate a more "filled" data set. This could be achieved by making the bins smaller in principle. But if the data to be binned is somewhat sparse then collecting those points over bigger overlapping bins gives a well-averaged effect. This is my understanding of it, which may not be the best reason out there.

Steven Lord on 29 Mar 2019

Do you need to visualize the overlapping bins (histogram) or just compute with overlapping bins (histcounts)?

Sign in to comment.

Sign in to answer this question.

Answer 1

Walter Roberson on 28 Mar 2019

1
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/452936-how-do-i-use-histcounts-with-overlapping-bins#answer_367795

Discretize three times per dimension, once with the bins exactly where you want them, once with the bins [overlap] earlier, once with the bins [overlap] later. Do the 27 different 3D binnings (each possible combination of early, middle, late), taking lists of indices. Then take the union of all of the indices in corresponding bins.

6 Comments
Show 4 older commentsHide 4 older comments

Prodip Das on 29 Mar 2019

Open in MATLAB Online

Thanks Walter, specifying the overlapping bin edges explicitly and binning them for each case seems to have worked.

This is a sample of what I've ended up using -

frac = 0.5; % Defines the percentage overlap. In this case 75% since the bin size is 2.
init_shift = 3.5; % Mainly to avoid empty values in histcounts.
xbins=min(x):2:max(x)+init_shift; 
ybins=min(y):2:max(y)+init_shift; 
zbins=min(z):2:max(z)+init_shift;
for f = 1:4
    [~,~,cx]=histcounts(x,xbins); 
    [~,~,cy]=histcounts(y,ybins);
    [~,~,cz]=histcounts(z,zbins);
     X{f} = accumarray(cx, x(:),[], @nanmean); 
     Y{f} = accumarray(cy, y(:),[], @nanmean);
     Z{f} = accumarray(cz, z(:),[], @nanmean);
     Um{f} = accumarray([cx, cy, cz], U(:),[], @nanmean); 
     Vm{f} = accumarray([cx, cy, cz], V(:),[], @nanmean); 
     Wm{f} = accumarray([cx, cy, cz], W(:),[], @nanmean); 
     xbins=xbins - frac; ybins=ybins - frac; zbins=zbins - frac;
end

I end up with 4 cells of 3D data (accumulated over 4 sets of bins). Not all 4 of these cells have the same size however.

I do have another question regarding how to collate this data in the same sequence as the bins. Should I post a separate query?

Thanks a lot !

Walter Roberson on 29 Mar 2019

Open in MATLAB Online

No, you lose all order information when you take the mean. It does not make sense to use the original order.

shifts = [-3.5 0 3.5];
whichpoints = cell(3,3,3);
cx = cell(3,1);
cy = cell(3,1);
cz = cell(3,1);
for idx = 1:3
  [~,~,cx{idx}] = histcounts(x, xbins+shifts(idx));
  [~,~,cy{idx}] = histcounts(y, ybins+shifts(idx));
  [~,~,cz{idx}] = histcounts(z, zbins+shifts(idx));
end
npoint = length(x);
nbx = length(xbins);
nby = length(ybins);
nbz = length(zbins);
pidx = (1:npoint).';
bs = [nbx, nby, nbz];
for xsi = 1:3
    for ysi = 1:3
        for zsi = 1:3
            whichpoints{xsi,ysi,zsi} = accumarray([cx{xsi}, cy{ysi}, cz{zsi}], pidx, bs, @(idx) {idx} );
        end
    end
end
allpoints = cell(nbx,nby,nbz);
for K = 1 : numel(whichpoints)
    allpoints = cellfun(@union, allpoints, whichpoints{K});
end

Now allpoints should be cell in x y z with each location holding the linear indices of all of the points that have been put into the bin taking into account overlaps. Each cell will have the respective indices in sorted order, and any one index will appear only once in any one cell. You can use the indices for whatever purposes you want, such as

cellfun(@(idx) nanmean(x(idx)), allpoints)

Prodip Das on 29 Mar 2019

Thanks Walter.

This is going to take me a while to completely get my head around as its not immediately clear to me.

I'll post the matrix collating bit as a separate question.

Walter Roberson on 29 Mar 2019

I think I might have the union loop wrong, possibly.

Sign in to comment.

Answer 2

Matt J on 28 Mar 2019

0
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/452936-how-do-i-use-histcounts-with-overlapping-bins#answer_367808

Edited: Matt J on 28 Mar 2019

Open in MATLAB Online

If you're willing to make some approximations in the interest of speed, this is a method that will do the whole 3D accumarray operation. It uses some FEX contributions that you must download, namely KronProd and ndSparse. Basically, it first histograms the x,y,z data normally into super-thin, non-overlapping bins. Then it basically consolidates those into overlapping bins by separable convolution.

%% simulated data
vmin=0; vmax=10;    %integer min and max assumed here
x=rand(1,10000)*(vmax-vmin)+vmin;    
y=rand(1,10000)*(vmax-vmin)+vmin;
z=rand(1,10000)*(vmax-vmin)+vmin;
          
%% binning parameter selections
binShift=0.5;   binWidth=1;    
%% Set-up computations
lowerEdges=vmin:binShift:vmax-binWidth;
upperEdges=lowerEdges+binWidth;
Nbins=numel(lowerEdges);
delta=vmax-vmin;
N=1000*delta;
L=(lowerEdges.')*N/delta+1;
U=(upperEdges.')*N/delta+1; 
T=cumsum(sparse(1:Nbins,L,1,Nbins,N+1)-sparse(1:Nbins,U,1,Nbins,N+1),2);
C=KronProd({T(:,1:N)},[1,1,1]); %separable convolution operator
%% Do computation
tic; 
    e=linspace(vmin,vmax,N);
    I=discretize(x,e).';
    J=discretize(y,e).';
    K=discretize(z,e).';
    H=ndSparse.build([I,J,K],1,[N,N,N]);
    A=full(C*H); %The "accumarray" result
toc; %Elapsed time is 1.182683 seconds.

1 Comment
Show -1 older commentsHide -1 older comments

Prodip Das on 29 Mar 2019

Thanks for the answer Matt ! I wasn't certain where the approximations lay, and wasn't very well versed with separable convolution. Needed a more quick fix as of now, will revert back to this in the future hopefully to understand better.

Sign in to comment.

How do I use histcounts with overlapping bins?

4 Comments
Show 2 older commentsHide 2 older comments

Accepted Answer

6 Comments
Show 4 older commentsHide 4 older comments

More Answers (1)

1 Comment
Show -1 older commentsHide -1 older comments

See Also

Categories

Tags

Community Treasure Hunt

How do I use histcounts with overlapping bins?

4 Comments Show 2 older commentsHide 2 older comments

Accepted Answer

6 Comments Show 4 older commentsHide 4 older comments

More Answers (1)

1 Comment Show -1 older commentsHide -1 older comments

See Also

Categories

Tags

Community Treasure Hunt

4 Comments
Show 2 older commentsHide 2 older comments

6 Comments
Show 4 older commentsHide 4 older comments

1 Comment
Show -1 older commentsHide -1 older comments