# Indexing arrays of binned data

10 views (last 30 days)
Tessa Kol on 9 Sep 2020
Commented: Tessa Kol on 10 Sep 2020
Dear all,
I have a cell array expData of 2745x1 cell. For every cell in this cell array I define the same range (i.e. bins). Then I discretize the data in expData based on the defined range.
Based on the discretized data in expData I want to find the corresponding values in the cell array velData, wich is illustrated in the picture below. Cell 14 is taken as an example. When the values are found I want to take the mean of it for every bin. I tried this using accumarray but with no luck:
for i = 1:length(files)
% Define the range of the bins
rng_x{i} = -0.3:0.06:0.3;
% Assign the data of x-coordinate to a predefined range
disc_x{i} = discretize(expData{i,1}(:,1),rng_x{i});
% Calculate mean of every bin
x_mean{i} = accumarray(disc_x{1,i}(:,1), expData{i,1}(:,1),[11 1], @mean);
% Define the range of the bins
rng_z{i} = 0:0.06:0.78;
% Assign the data of z-coordinate to a predefined range
disc_z{i} = discretize(expData{i,1}(:,3),rng_z{i});
% Calculate mean of every bin
z_mean{i} = accumarray(disc_z{1,i}(:,1), expData{i,1}(:,3),[13 1], @mean);
vx_disc{i} = accumarray(disc_x{1,i}(:,1), velData{i,1}(:,1),[11 1], @mean); %Did not work
end
Splitapply does not work in this case since the bins will go empty when moving through the cells. You will get the following error if you use splitapply in this case:
"For N groups, every integer between 1 and N must occur at least once in the vector of group numbers."

Dana on 9 Sep 2020
Without knowing what's in velData or what you mean by "Did not work" (did you get an error, and if so what? did it give an unexpected answer, and if so what did you expect and what did you get?), it's hard to offer suggestions. Can you provide some more details.
Tessa Kol on 9 Sep 2020
expData contains data of the following:
column 1: x-coordinate of particles
column 2: y-coordinate of particles
column 3: z-coordinate of particles
column 5 & 6: ...
velData contains data of the following:
column 1: velocity of particle in x-direction
column 2: velocity of particle in y-direction
column 3: velocity of particle in z-direction
I got an error:
Index in position 1 exceeds array bounds (must not exceed 1).
Error in SiloV1_results_single (line 80)
vx_disc{i} = accumarray(disc_x{1,i}(:,1), velData{i,1}(:,1),[11 1], @mean);
But even if I solve the above error, I doubt whether I will get the intended results. Because accumarray is saying that the data from velData will be devided into bins specified by disc_x.
I try to explain the picture more:
I devided the values of expData into bins, which you see in the picture.
Bin 1 is from -0.3 to -0.24
Bin 2 is from -0.24 to -0.18
Bin 3 is from -0.18 to -0.12
Bin 4 is from -0.12 to -0.06
Bin 5 is from -0.06 to 0
Bin 6 is from 0 to 0.06
Bin 7 is from 0.06 to 0.12
Bin 8 is from 0.12 to 0.18
...etc.
For example: 0.0310 goes into bin 6 because it is between 0.06 and 0.12
Value 0.0310 corresponds to 0 in velData
Value 0.0423 corresponds to -0.0283 in velData
Value 0.0156 corresponds to -0.0209 in velData
... etc.
I want matlab to find the corresponding data in velData of all the bins. And when those values are found take the mean of each bin.

Steven Lord on 9 Sep 2020
Take a look at the groupsummary function.
% Include rng default so you generate the exact same random numbers I did
rng default
x = randn(10, 1);
y = -2:0.25:2;
d = discretize(x, y);
[values, groups] = groupsummary(x, d, @sum);
% Show the results in tabular form
xAndD = table(x, d, 'VariableNames', {'x_value', 'group'})
vAndG = table(values, groups, 'VariableNames', {'summed_value', 'corresponding_group'})
The value of summed_value in the row of vAndG whose corresponding_group entry is 10 represents the sum of the elements in the x variable in xAndD whose rows have 10 in the group variable.
group10_v1 = vAndG{vAndG.corresponding_group == 10, 1}
group10_v2 = sum(xAndD{xAndD.group == 10, 1})
group10_v1 == group10_v2 % True
Because of the rng default call I know that d has 10 in positions 5 and 8.
group10_v1 == x(5)+x(8) % True

#### 1 Comment

Tessa Kol on 10 Sep 2020
I solved the issue myself, but thank you very much for responding! And I can always use this knowledge when I am facing other problems.

Dana on 9 Sep 2020
Index in position 1 exceeds array bounds (must not exceed 1).
This error is an indexing error, which suggests to me that one or more of your indices in that line of code are wrong. Further, it's not reporting the error from inside the function accumarray, which means the error is happening before anything is actually passed to that function. Based on that, we conclude that the error arises in the arguments you're passing to accumarray.
Since it's indicating that an index in position 1 is wrong, and the only part of that line of code with an index in position 1 that can potentially exceed 1 is velData{i,1} (the index in position 1 exceeds 1 if i>1), that's the obvious candidate. If you do size(velData,1), do you get something greater than 1? If not, that's your problem right there.
Based on my understanding of what you're trying to do, I would think you should get what you're after if you fix that problem. However, you said, "But even if I solve the above error, I doubt whether I will get the intended results. Because accumarray is saying that the data from velData will be devided into bins specified by disc_x." Isn't that what you want? I don't understand why that's a problem.
Essentially, using your strategy here, for file i, each row of expData{i,1} is associated with the same row of velData{i,1} (ignoring the indexing error, anyway). You're then binning the rows of expData{i,1} and velData{i,1} according to the values in the first column of expData{i,1}, with the index of the corresponding bin stored in the vector disc_x{i}. Next, you want to compute the means of the first column of velData{i,1} by bin. If that's what you're after, then your code should do that (again, as long as you fix the above indexing issue first).

Tessa Kol on 10 Sep 2020
I think I am approaching this the wrong way. Let me start over.
I have a number of data files where every data file is stored in a cell array. Thus, :
datac.1 is stored in expData{1,1} at time = 1 second
datac.2 is stored in expData{2,1} at time = 2 seconds
datac.3 is stored in expData{3,1} at time = 3 seconds
... etc.
Each data file contains the (x,y,z) coordinates of each particle at a different time step.
I have also a number of data files that contain the (x,y,z) velocities of each particle at a different time step. These are also stored in a cell array. Thus,:
datav.1 is stored in velData{1,1} at time = 1 second
datav.2 is stored in velData{2,1} at time = 2 seconds
datav.3 is stored in velData{3,1} at time = 3 seconds
I want to make a 2D bin of the x and z coordinates and their corresponding velocities. With hist3 it will only count the number of points that fall into each grid boxes (see picture below) But I want to take it a step further. I want to take the average of the x and z values in each bin. Then I want to determine the average velocity of each bin.
Thus at the end I want the point of engagement of each bin and the corresponding average velocity of each bin. Ultimately I want to accomplish something like this (see picture below) and make a contourf plot of it. I know how to make such a plot as below, the only thing I don't know is how to get the data into bins and take the average as I discribed above. Tessa Kol on 10 Sep 2020
Thank you for sharing your knowledge, I can use this always in other problems I face. I solved my own problem and it turns out that I was taking a detour with the approach I was trying first.

J. Alex Lee on 10 Sep 2020
From what I can gather, it hsould be possible to reorganize your experimental data into a Nx6 matrix called Data, where N is the number of coordinate,velocity pairs, and the 6 columns are organized as
x|y|z|u|v|w
-----------
| | | | |
To bin just on the (x,z) coordinates, you can use histcounts2
[~,Xedges,Zedges,binX,binZ] = histcounts2(Data(:,1),Data(:,3),nBins);
where nBins is the number of bins you want in each direction x and z.
You can use Xedges and Yedges to compute the bin centers, and binX, and binY are the assignments of each data point (row in Data) into the 1D bins along each direction.
From there you just need to use binX and binY to determine which 2D bin a data point (row in Data) belongs to. I would then just loop through those indices to find average velocities, but perhaps you can somehow use "groupsummary" as suggested above, if you are allowed to define your own groups manually

J. Alex Lee on 10 Sep 2020
it would be nice if there was a "discretize2" function, this doesn't seem like such a niche need...
Tessa Kol on 10 Sep 2020
Turns out I didn't need to do this.
I had the (x,y,z) coordinates of every particle and the (x,y,z) velocities. With that I managed to make this for one data file, which is the velocity profile: Now I am stuck at doing this for every data file, see:

R2020a

### Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!