index generation for multidimensional discretization

I require to discretize (bin) my feature space X, it has corresponding function values Y, therefore I need the index of the values in the bin.
So I have a very crude which does what I want but it is not that nice to expand for larger dimensions...
X = randn(1000,4,1);
Y = randn(1000,1);
nBins = 2;
[xD,E] = discretize(X,nBins);
idx = cell(nBins,nBins,nBins,nBins);
for i = 1:size(X,1)
idx{xD(i,1),xD(i,2),xD(i,3),xD(i,4)}(end+1) = i;
Is there a pre-implemented code which does this (but in nice), or some fancy new algorithm which does bining with indexing (however I have only Matlab 2018 available)?
Kind Regards Max

Ronit on 23 Apr 2024
Hi Maximilian,
In MATLAB, while there isn't a direct built-in function that discretizes and indexes multi-dimensional data in one go, you can achieve a cleaner and more scalable solution by using some of MATLAB's built-in functions creatively. The approach involves discretizing each dimension of your feature space X using discretize, then constructing a linear index from the multi-dimensional indices, which allows you to group or bin the data more flexibly.
Here's a refined approach that should work for any number of dimensions in X:
X = randn(1000,4);
Y = randn(1000,1);
nBins = 2;
[xD, ~] = discretize(X, nBins);
% Calculate the size for each dimension in the linear index space
dims = repmat(nBins, 1, size(X, 2));
% Convert multi-dimensional indices to linear indices
linearIndices = sub2ind(dims, xD(:,1), xD(:,2), xD(:,3), xD(:,4));
% Determine the maximum possible index to size idx appropriately
maxIndex = prod(dims);
% Initialize idx with the correct size
idx = cell(maxIndex, 1);
% Populate idx
for i = 1:length(linearIndices)
idx{linearIndices(i)}(end+1) = i;
% Accessing Y values for an example bin combination
% Define the bin combination
binCombination = [1, 2, 1, 2];
% Convert bin combination to linear index
exampleLinearIndex = sub2ind(dims, binCombination(1), binCombination(2), binCombination(3), binCombination(4));
% Now access Y values safely
if ~isempty(idx{exampleLinearIndex})
Y_example = Y(idx{exampleLinearIndex}, :);
Y_example = []; % Handle case where the bin combination is empty
Explanation for the code:
1. It discretizes the feature space X into nBins bins for each dimension.
2. It converts the multi-dimensional bin indices xD for each sample into a single linear index linearIndices. This is a key step that allows you to handle an arbitrary number of dimensions more easily.
3. Then it groups the indices of X (which correspond to rows in Y) based on their bin combination, using the linear index to identify each unique bin combination.
4. Then it demonstrates how to access the Y values corresponding to a specific bin combination by converting that combination into a linear index and then using this index to access the relevant cell in idx.
Please refer to the documentation of “sub2ind” for better understanding of the function:
Hope this helps!




