How can I plot a relative frequency histogram in Matlab?
Show older comments
I have developed a script to calculate the power fluctuation of a PV power plant. Basically, I'm calculating the magnitude of the power fluctuation dP, at an instant t, for a sampling period dt, as the difference between two power outputs, normalized to the nominal power Pnom of the plant, as follows:
dP( t ) = [ P( t + dt ) - P( t ) ] / Pnom.
I'm exporting 300 seconds of data from simulink to workspace and I want to plot relative frequency histograms for different sampling periods dt. Longer sampling timte dt implies in a smaller number of data inside this 300 seconds. So, i.e., for dt = 1 second I have 299 samples, for dt = 20 seconds, I have 280 samples. I have already calculated the power fluctuation dP( t ) and it can be observed in the figure below:

The vertical axis describe the percentual power fluctuation and these values can be larger sometimes. I want to plot a figure that describes the relative frequency of these power fluctuations. The figure below (I took from a paper) represents the relative frequency plots I want to apply to my data.

I appreciate any help!
Accepted Answer
More Answers (1)
William Rose
on 28 Jun 2021
1 vote
The overlapping bars in the plot abve are potentially confusing to the viewer. Therefore I added some code to make a second plot, shown below. The data is the same, still presented as probabilities, but it is easier to understand, in my opinion. It is easy to modify this to show probabilities for more than two data sets. See attached code. The probabillities are different in this example because the random numbers are different on each run.

6 Comments
Eric Bernard Dilger
on 28 Jun 2021
Edited: Eric Bernard Dilger
on 28 Jun 2021
William Rose
on 29 Jun 2021
Your plots look very nice! I'm glad you are not taking the absolute value. The only reason I did so is that the figures in the paper you provided, in your original posting, had only positive fluctuations. I htink it is better to separate the positive and negative as you have done in your plots.
You asked if there is an optimal number of bins. I suppose that depends on the definition of optimality.
If you want to do a statistical test for whether two datasets are from the same distribution, I recommend the two-sided Kolmogorov-Smirnov test. The test compare the two vectors of percent fluctuations (Y1 and Y2 in the case of my example code). These vector have length 300 and 50 in your case. The different lengths are not a problem. You do not bin the data. See the Matland help for ks2stat().
One reasonable definitition of "optimal bin width" is the width that minimizes the integral of the squared difference between the histogram and the theoretical probability distribution. In this case, the optimal bin width is given, roughly, by the Freedman-Diaconis rule:
I don't know if this is the rule used by Matlab's histogram(), when you don't specify the width or number of bins.
Another approach to optimal binning is this one by Scargle et al. I guess this is the same Scargle as the Lomb-Scargle periodogram, about which I recently learned. Scargle et al. say: "This paper addresses the problem of detecting and characterizing local variability in time series and other forms of sequential data. The goal is to identify and characterize statistically significant variations, at the same time suppressing the inevitable corrupting observational errors. We present a simple nonparametric modeling technique and an algorithm implementing it—an improved and generalized version of Bayesian Blocks—that finds the optimal segmentation of the data in the observation interval." When they say optimal segmentation of the data, they mean opimal binning.
Another optimal binning approach is given by Shimazaki and Shinomoto. Here is web page implementing S. & S.'s method: https://www.neuralengine.org//res/histogram.html .
Also, see this discussion of optima bin widths: https://stats.stackexchange.com/questions/798/calculating-optimal-number-of-bins-in-a-histogram .
Eric Bernard Dilger
on 29 Jun 2021
Edited: Eric Bernard Dilger
on 29 Jun 2021
William Rose
on 29 Jun 2021
You are doing a great project, and you are helping do something about global warming. I have driven past the Ivanpah solar generating plant in California a few times. At 2.5 km from the tower, it is very impressive, but also kind of scary. The towers containing the boilers glow white-hot. I would not want to be very close. I realize that photovoltaic generation is not the same as solar thermal electric generation.
I would use the 2-sample K-S test to answer the question "Do the model and the real data have the same distribution of fluctuations?" Note that this question does not address differences in the time structure of the fluctuations. For examples, if you took all your model fluctations and shuffled them in time, or shuffle the real data, or shuffle both (but keep them separate), the result of the 2-sided K-S test will be unaffected. If time structure does matter to you, then you need to do some other kind of testing. For example, you might compute the autocorrelation of the real data and the autocorrelation of the model data, and compare them.
You write like a native speaker of English, or better than most native speakers, including me. Are you in Pato Branco, Brasil?
Eric Bernard Dilger
on 2 Jul 2021
Edited: Eric Bernard Dilger
on 2 Jul 2021
William Rose
on 2 Jul 2021
please email me at rosewc@udel.edu so i can reply by email
Categories
Find more on Solar in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!


