How to re-bin the 2D data?

Hi everyone,
I've 2D data, first column (X) is the time and second column (Y) is the corresponding data. It is easy to re-bin the X data as the data values are linear, but could not figure out the way to re-bin the Y data as it is exponentially decaying. Attached screenshot is a piece of data. How can I re-bin the Y data so that I can pull out Y values for each re-binned X data accurately?Screen Shot 2019-12-30 at 2.03.33 PM.png
X = data(4:214, 1);
Xmax = max(X); Xmin = min(X);
N = 106; % no. of bins I want, this will give me bin size of 0.1 (say)
dy = (Xmax - Xmin)/ (N-1); % N-1 will be clear in the next line
Xedges = Xmin - dy/2 : dy : Xmax + dy/2;
Xedges = Xedges'; % change row to column matrix (transpose)
% re-bin Y data so that I can pull out Y values for each re-binned X column
Y = data(4:214, 2);
Ymax = max(Y); Y1min = min(Y);
N = 106; % no. of bins we want
.... % not sure how to proceed ahead

 Accepted Answer

Adam Danz
Adam Danz on 30 Dec 2019
Edited: Adam Danz on 30 Dec 2019
You can use Y = discretize(X,edges) to bin the X data and use the bin index to group the Y data into bin categories. You probably cannot assume that there will be an equal number of data points in each bin so the grouped y-data must be stored in a cell array.
Here's a demo using your variable names.
[bins, Xedges] = discretize(X,N); % Xedges are not used here
Y = data(4:214, 2);
yBinned = arrayfun(@(i)Y(bins==i),unique(bins),'UniformOutput',false);
yBinned is an n-by-1 cell array where yBinned{j} are the Y values in bin number j whose edges are defined by Xedges([j,j+1]).

10 Comments

blues
blues on 30 Dec 2019
Edited: Adam Danz on 30 Dec 2019
I tried to run these lines but end up with errors:
Error using discretize
Too many output arguments.
Error in data_rebining (line 23)
[bins, Xedges] = discretize(X,N); % Xedges are not used here
Any hints on this?
Adam Danz
Adam Danz on 30 Dec 2019
Edited: Adam Danz on 30 Dec 2019
Could you tell me what the following two outputs are?
ver() % specifically, I'm looking for your matlab version (eg: r2019b)
and
which discretize -all
Futhremore, if you either attach a mat file containing the "data" variable or paste the content of that variable as a comment in such a way that I can very easily copy it into my command window, I could apply my idea to your data.
The 2nd output to discretize() was added in r2016b so I'm guessing your version of Matlab is prior to that. The 2nd input will likely need changed, too, since prior to r2016b you had to provide the edges. In that case, you can use your Xedges variable like this
bins = discretize(X,Xedges);
Thank you Adam. I am uisng the MATLAB Version: 9.0.0.341360 (R2016a).
bins = discretize(X,Xedges); % seems working but it gives the output in differnet form. Here is the output in workspace:
6.13803900000000
[6.01808600000000;5.86342700000000]
[5.72015900000000;5.59193600000000]
[5.44887000000000;5.33123500000000]
[5.18754000000000;5.04445800000000]
[4.93591600000000;4.82840700000000]
[4.70557900000000;4.54286600000000]
[4.42979400000000;4.27833500000000]
[4.19564900000000;4.05971000000000]
.............
The dimension of YBinned column is right but I think I may need to alter the following line;
yBinned = arrayfun(@(i)Y(bins==i),unique(bins),'UniformOutput',false);
so that I could get the mean value of data in all cells.
How to modify this line such that I can get a mean value ( for eg. mean of [6.01808600000000;5.86342700000000] ....) in each cells?
Adam Danz
Adam Danz on 30 Dec 2019
Edited: Adam Danz on 30 Dec 2019
"...seems working but it gives the output in differnet form."
Looking at the r2016a documentation, the output should be the same form
"How to modify this line such that I can get a mean value?"
Instead of modifying that line, it will likely be useful to keep those grouped data. You can compute the mean of each group (each element of the cell array) using
yMean = cellfun(@mean,yBinned);
If your data has NaNs that you'd like to ignore,
yMean = cellfun(@(x)mean(x(~isnan(x))),yBinned);
Now code is working and obtained the data values.
Can I ask you a one more question? Do you see the problem in the following lines of code?
Y = data(4:214, 2);
binEdge = linspace(min(x),max(x),106);
[n,bin] = histc(x,binEdge);
Ybinned = accumarray(bin,Y,[],@mean);
I got slighlty different result using these lines comparing your suggestions. Just curious which version is more accurate! Thank you for your great help.
Adam Danz
Adam Danz on 30 Dec 2019
Edited: Adam Danz on 31 Dec 2019
"Now code is working and obtained the data values. "
Great!
" Do you see the problem in the following lines of code? "
Hmmm, it's not immediately obvious to me just by looking at the code. I could probably see the difference fairly quickly if I had your data so I could run each version and look at the outputs.
If you can't attach the data and you'd rather troubleshoot it on your own, I'd start by comparing the binEdge values between the two methods; if they match I'd then compare the bin index values.
blues
blues on 30 Dec 2019
Edited: blues on 30 Dec 2019
Attached is the data that I am working on. Could you please take a quick look?
Adam Danz
Adam Danz on 30 Dec 2019
Edited: Adam Danz on 30 Dec 2019
There isn't enough information available in your attachment for me to compare the two versions.
For example, I don't know what your Xedges values are. Also, there's only 211 rows of data in your attachment but your code references row 214.
If you attach 1) a mat file containin the data needed for me to run the code and 2) the two sections of code you're using; I can look into the differences.
You can get Xedges from the code that I attached before/here:
X = data(1:211, 1);
Xmax = max(X); Xmin = min(X);
N = 106; % no. of bins we want
dy = (Xmax - Xmin)/ (N-1); % N-1 will be clear in the next line
Xedges = Xmin - dy/2 : dy : Xmax + dy/2;
Xedges = Xedges';
Previously I read the data excluding header info from .xls. So, after removing headers data(1:211, 1) i.e., same as in code.
I don't have a mat file.
"You can get Xedges from the code that I attached before"
Actually, I couldn't, because it extended to row 214 but your data only had 211 rows. I didn't know anything about headers etc until your previous comment.
"I don't have a mat file."
It would take you less than 1 minute to create one using the save() function. By sharing a mat file, we know we're using the exact same data. But when you give the csv file, it requires me to read in the data and there are multiple ways to do that so we could end up with slightly different values. Also, it takes more time for us volunteers to read-in data. The idea is to make it easy for us to help you and to make sure we're all looking at the exact same thing.
I copied your data from the csv file into the command window and named the variable "data". Then I ran the following two sections of code. The first section produces the variable yMean and the second section produces the variable yMean2. Then I compare the values.
X = data(1:211, 1);
Xmax = max(X); Xmin = min(X);
N = 106; % no. of bins we want
dy = (Xmax - Xmin)/ (N-1); % N-1 will be clear in the next line
Xedges = Xmin - dy/2 : dy : Xmax + dy/2;
Xedges = Xedges';
% VERSION 1
bins = discretize(X,Xedges);
Y = data(1:211, 2);
yBinned = arrayfun(@(i)Y(bins==i),unique(bins),'UniformOutput',false);
yMean = cellfun(@(x)mean(x(~isnan(x))),yBinned);
% VERSION 2
[n,bins2] = histc(X,Xedges);
yMean2 = accumarray(bins2,Y,[],@mean);
% COMPARE VERSIONS
isequal(bins,bins2) % = TRUE; so they are the same
isequal(yMean,yMean2) % = TRUE; so they are the same
As you can see, the two sections produce the same outputs. If you are getting different values it could be due to any of the following reasons
  1. The inputs in your code are different between the two versions.
  2. Your versions don't match my versions.
  3. I'm using r2019b and you're using r2016a. I doubt this is the problem.

Sign in to comment.

More Answers (0)

Asked:

on 30 Dec 2019

Edited:

on 31 Dec 2019

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!