Function ecdf break down for large datasets

3 views (last 30 days)
Martin
Martin on 24 Feb 2011
Hi,
I have a very large vector x (around 130 million elements). When I try to find the empirical cumulative distribution function of the values from that vector using MATLAB's command "ecdf(x)" the function breaks down. Its plot shows the ECDF for only the smaller values of x and doesn't even exist for bigger values of x. When I try to run the ecdf command on only a part of the vector (say 10 million elements), the results seem OK. Does anyone know what could be wrong with the ecdf function so that it breaks down in this manner for very large datasets?
Thank you very much for you help.
Martin
  1 Comment
Martin
Martin on 8 Mar 2011
Is there anyone who can help me with this issue? Thanks.

Sign in to comment.

Answers (1)

Mathieu Boutin
Mathieu Boutin on 8 Sep 2011
Hi Martin. You could try my new homemade function and see if it works fine:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function [v_f,v_x] = homemade_ecdf(v_data)
nb_data = numel(v_data);
v_sorted_data = sort(v_data);
v_unique_data = unique(v_data);
nb_unique_data = numel(v_unique_data);
v_data_ecdf = zeros(1,nb_unique_data);
for index = 1:nb_unique_data
current_data = v_unique_data(index);
v_data_ecdf(index) = sum(v_sorted_data <= current_data)/nb_data;
end
v_x = [v_unique_data(1) v_unique_data];
v_f = [0 v_data_ecdf];
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!