How can I plot a relative frequency histogram in Matlab?

I have developed a script to calculate the power fluctuation of a PV power plant. Basically, I'm calculating the magnitude of the power fluctuation dP, at an instant t, for a sampling period dt, as the difference between two power outputs, normalized to the nominal power Pnom of the plant, as follows:
dP( t ) = [ P( t + dt ) - P( t ) ] / Pnom.
I'm exporting 300 seconds of data from simulink to workspace and I want to plot relative frequency histograms for different sampling periods dt. Longer sampling timte dt implies in a smaller number of data inside this 300 seconds. So, i.e., for dt = 1 second I have 299 samples, for dt = 20 seconds, I have 280 samples. I have already calculated the power fluctuation dP( t ) and it can be observed in the figure below:
The vertical axis describe the percentual power fluctuation and these values can be larger sometimes. I want to plot a figure that describes the relative frequency of these power fluctuations. The figure below (I took from a paper) represents the relative frequency plots I want to apply to my data.
I appreciate any help!

 Accepted Answer

Matlab's histogram() command is nice. Attached script shows how you could use it. It produces the plot below.

4 Comments

Hello @William Rose. Thanks for your help. Actually, I was trying to normalize my histogram to a probability density function. Thanks to you, I finally got it. Thank you very much!!!
@Eric Bernard Dilger, You are welcome. Good luck with your work.
Above concept really helps to find fluctuation...can you share me fluctutation reference material or published paper for reference
I am glad that the above concept and code are helpful for understanding and estimating fluctuation. The analysis of fluctuations above is not based on any particular references or journal articles. It is based on defining fluctuation as the absolute value of the fractional deviation from the mean value.
One could define fluctuation differently, depending on the situation and goals. For example, you might choose not to take the absolute value. Or you might choose not to divide by the mean vaue of x. And so on. Those choices depend on what is most useful to you.

Sign in to comment.

More Answers (1)

The overlapping bars in the plot abve are potentially confusing to the viewer. Therefore I added some code to make a second plot, shown below. The data is the same, still presented as probabilities, but it is easier to understand, in my opinion. It is easy to modify this to show probabilities for more than two data sets. See attached code. The probabillities are different in this example because the random numbers are different on each run.

6 Comments

Thank you again @William Rose! I've just applied your suggestion and it really worked. Here I'm comparing two methods of calculating the power fluctuation. The blue line at the left side of the figure below is not long as the orange line because the orange one have 6 times more samples. The right side of the figure shows the data points I got by applying the script you suggested. I wonder if is there any rule to adjust the histogram's BinWidth. I noted that smaller values implies in a larger number os points, so that a very small BinWidth value implies in points equal to zero, looking like a discrete signal. On the other hand, a very large BinWidth value implies in something like the last plot at the right side, looking like something sampled by a low sample frequency. Once I'm working with a known array and I'm always able to determine the number of samples, is there any approach to adjust the optimal value for the BinWidth?
Your plots look very nice! I'm glad you are not taking the absolute value. The only reason I did so is that the figures in the paper you provided, in your original posting, had only positive fluctuations. I htink it is better to separate the positive and negative as you have done in your plots.
You asked if there is an optimal number of bins. I suppose that depends on the definition of optimality.
If you want to do a statistical test for whether two datasets are from the same distribution, I recommend the two-sided Kolmogorov-Smirnov test. The test compare the two vectors of percent fluctuations (Y1 and Y2 in the case of my example code). These vector have length 300 and 50 in your case. The different lengths are not a problem. You do not bin the data. See the Matland help for ks2stat().
One reasonable definitition of "optimal bin width" is the width that minimizes the integral of the squared difference between the histogram and the theoretical probability distribution. In this case, the optimal bin width is given, roughly, by the Freedman-Diaconis rule:
I don't know if this is the rule used by Matlab's histogram(), when you don't specify the width or number of bins.
Another approach to optimal binning is this one by Scargle et al. I guess this is the same Scargle as the Lomb-Scargle periodogram, about which I recently learned. Scargle et al. say: "This paper addresses the problem of detecting and characterizing local variability in time series and other forms of sequential data. The goal is to identify and characterize statistically significant variations, at the same time suppressing the inevitable corrupting observational errors. We present a simple nonparametric modeling technique and an algorithm implementing it—an improved and generalized version of Bayesian Blocks—that finds the optimal segmentation of the data in the observation interval." When they say optimal segmentation of the data, they mean opimal binning.
Another optimal binning approach is given by Shimazaki and Shinomoto. Here is web page implementing S. & S.'s method: https://www.neuralengine.org//res/histogram.html .
Thanks very much for your support! These references you cited is going to be very useful on my research. Actually, I'm trying to compare the simulated data with the real data, once I developed an algorithm that calculates recursive fractals to generate the ground shadow due to the clouds movement. These shadows are the main reason for the irradiance fluctuation, and accordlingy, they are also the main reason for the power fluctuation in a PV power plant. Hence, I'm trying to compare the data generated by simulation to avaiable real data, so I can tune my algorithm to the generated data become as real as possible.
Thank you for helping me on my reserach. I'm really grateful for all information and the references you cited. It will be really useful!
You are doing a great project, and you are helping do something about global warming. I have driven past the Ivanpah solar generating plant in California a few times. At 2.5 km from the tower, it is very impressive, but also kind of scary. The towers containing the boilers glow white-hot. I would not want to be very close. I realize that photovoltaic generation is not the same as solar thermal electric generation.
I would use the 2-sample K-S test to answer the question "Do the model and the real data have the same distribution of fluctuations?" Note that this question does not address differences in the time structure of the fluctuations. For examples, if you took all your model fluctations and shuffled them in time, or shuffle the real data, or shuffle both (but keep them separate), the result of the 2-sided K-S test will be unaffected. If time structure does matter to you, then you need to do some other kind of testing. For example, you might compute the autocorrelation of the real data and the autocorrelation of the model data, and compare them.
You write like a native speaker of English, or better than most native speakers, including me. Are you in Pato Branco, Brasil?
Thanks for the compliment. Actually, sometimes I need to look for some words on google, but I'm practicing a lot while reading so many papers. Unfortunatelly, due to the covid pandemic, our classes still have not come back yet and we are in homework (once our vaccination is not going so fast). So I'm writing from a city very near to Pato Branco, Brasil until our classes come back. I'm studying for my master degree in a very good institution named Federal University of Technology - Paraná. Renewable energy is one of our knowledge areas and there are many good works being produced here. By the way, California is always innovating on the electric power system. CAISO is one of the references I'm always taking a look, due to the good stuff coming from there. I'd really like to visit California in the future, and I'd also like to visit Florida to watch some rocket lauches.
I'll take your suggestion to better analyze the simulated and the real data. I'm note sure if I'm correct when comparing generated histograms with this kind of kurtosis below. The time structure doesn't matter to me, I'm only interested in the power fluctuation and the relative frequency each fluctuation occurs, so I can adjust the cloud thickness and the wind velocity, tunning my algorithm to generate as much realistic data as possible. Another problem is that I'm not able to generate a full year of data, as it's represented by the figure below. You know, along the year we have different seasons, irradiance values, wind velocity and shaded areas. So I'm interested in tunning my algorithm to the worst case of irradiance, that may occurs at the midday of a summer day.
please email me at rosewc@udel.edu so i can reply by email

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!