How can I average one column based on the bins created for the other column?

Hi All,
I have two columns as a matrix they show the elevaiton (m) of data points (1st column) and thier snow depth (Second column). I want to make bins for elevaiton and see what is the average of the snow depth in that elevation range. I want the elevation data to go to bins of every 12 meter and see i each of this bins how much was the average of the snow depth. So at the end I need two columns one shows the bins (elevation: 0, 12, 24, ...) and the other shows average of snowdepth in that elevation range.
Any idea please

Answers (2)

This example creates a fake dataset to work with. It then creates bins for the elevation data in steps of 12. Using splitapply, it finds the mean of snow depth for each bin and then puts the data into a table.
% Create fake data [elevation, snowDepth]
data = [randi(100, 1000, 1) + rand(1000,1), randn(1000,1) + 10];
% Create bins
elevationBins = 0:12:max(data(:,1)+11);
% Determine bin membership
[~, bins, binID] = histcounts(data(:,1), elevationBins);
% average snow depth per bin
binMeans = splitapply(@mean, data(:,2), findgroups(binID));
% Put results into table
table(unique(bins(binID))', binMeans, 'VariableNames', {'BinMin', 'BinMean'})

4 Comments

Thanks, but the results on the sample seems not correct! :(
Perhaps you're not defining bins correctly. Here's a simpler example where the data range from 0-23 for elevation. Members of the 1st bin all have snow = 1 and the second bin all have snow = 2.
data = [(0:23)', repelem([1;2],12, 1)];
Elev Snow
____ ____
0 1
1 1
2 1
3 1
4 1
5 1
6 1
7 1
8 1
9 1
10 1
11 1
12 2
13 2
14 2
15 2
16 2
17 2
18 2
19 2
20 2
21 2
22 2
23 2
The solution I proposed results in the expected means:
BinMin BinMean
______ _______
0 1
12 2
Recall that bins [0, 12, 24, ...] define the start of each bin. So bin 1 includes all numbers >=0 and < 12. Bin 2 is all numbers >= 12 and <24, etc.
my data is not integer. It gives an error that:
Error using splitapply: Group numbers must be a vector of positive integers, and cannot be a sparse vector.
:(
The real data looks like this:
data.jpg
You might not be applying the example correctly to your data. Another possibility is that outliers are messing with the bins (for example, if 1 of your elevations is 100x larger than the rest).
I edited my solution so that it handles outliers better. The change I made was
binMeans = splitapply(@mean, data(:,2), binID); %OLD
binMeans = splitapply(@mean, data(:,2), findgroups(binID)); %NEW
Try to apply your data again to the updated solution. If you continue to have problems, please provide a short sample of your data that I can either copy or download the code.

Sign in to comment.

Asked:

on 8 Feb 2019

Edited:

on 11 Feb 2019

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!