paretotails function and frequency vector

1 view (last 30 days)
Hi matlab community,
I have a series of measurements, following a normal distribution, which I know that they must be greater than Xmin and lower than Xmax.
My problem is that I have 10^9 measurements, and I can’t save all of them in on array due to memory issue.
My solution is to discretize the range of value, knowing that my measurements are nd decimals accurate:
Step=10^-nd;
X=floor(Xmin*10^nd)/10^nd : step : ceil(Xmax*10^nd)/10^nd;
Therefore, for each measurement, I find the index of the closest values from X, and add +1 to the frequency vector CNT :
CNT=zeros(size(X));
For each measurement m do:
[~,i]=min(abs(X-m)); CNT(i)=CNT(i)+1;
The obtained distribution can be display by:
Edges=[X-step/2 Xmax+step/2];
figure; histogram('BinEdges',Edges,'BinCounts',CNT);
My goal is to be able to estimate the probability to measure a value greater than a threshold value Xth after 10^14 or more measurements.
For that, I would like to apply the “paretotails” function to my problem.
Unfortunately, the function doesn’t propose a way to use a frequency vector.
So I’m asking for your help, if anyone has a solution to my issue.
Thank you all in advance,
Rémy

Accepted Answer

Aditya Patil
Aditya Patil on 20 Nov 2020
I have brought this issue to the notice of the concerned people.
If the data is time based, then you can work around the problem by downsampling it. If not, you can recreate the data by using the histogram, but by reducing the number of samples in each bin by same factor. Then you should be able to use paretotails.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!