Example C: Tail Probabilities of Multivariate Normal Distribution.
Examples A and B make it clear that if we are trying to view uniform data over the hypercube most (spherical) neighborhoods will be empty! Let us examine what happens if the data follow the standard d-dimensional Normal distribution:
Clearly, the origin (mode) is the most IikeIy point and the equiprobable contours are spheres. Consider the spherical contour, , where the density value is only 1% of the value at the mode. Now
therefore, the probability that a point is within the 1% spherical contour may be computed by
The last equation gives the probability a random point will not fall in the “tails” or, in other words, will fall in the medium- to high-density region. Around 5 or 6 dimensions, the probability mass of a multivariate Normal begins a rapid migration into the extreme tails. In fact, more than half of the probability mass is in a very low density region for 10-dimensional data! Silverman (1986) has dramatized this in 10 dimensions by noting that . In very high dimensions, virtually the entire sample will be in the tails!
% Workspace Initialization. clc; clear; close all;
NumOfRuns = 1e4; % Number of Monte Carlo runs. Dimensions = 1:20; % Problem dimensions. LD = length(Dimensions); counter_tails = zeros(1,LD); counter_body = zeros(1,LD); PDF_Values = zeros(NumOfRuns,LD); for i=1:LD dim = Dimensions(i); % Multivariable normal numbers generation: M = zeros(dim,1); % Mean vector. SIGMA = eye(dim); % Covariance Matrix. NormalSamples = mvnrnd(M,SIGMA,NumOfRuns); % Generate random points following the n-dimensional % standard normal distribution. max_valuePDF = (2*pi)^(-dim/2); for k=1:NumOfRuns PDF_Values(k,i) = mvnpdf(NormalSamples(k,:).',M,SIGMA); % Every point x with pdf(x) < 0.01*max(pdf) is considered to belong to % the tail of the multivariate Normal distribution. if PDF_Values(k,i)<0.01*max_valuePDF counter_tails(i) = counter_tails(i) + 1; else counter_body(i) = counter_body(i) + 1; end end end
Plot the fraction of datapoints in the tails found by Monte Carlo simulation:
figure('Position',[100 100 900 400]); subplot(1,2,1); stem(Dimensions,counter_tails/NumOfRuns,'r*'); hold on; stem(Dimensions,1-cdf('Chisquare',-2*log(0.01),Dimensions)); xlabel('Dimension'); ylabel('Fraction of Total Generated Samples in Tails'); grid on; xlim([Dimensions(1) Dimensions(end)]); title('Analytically Calculated (blue circles) and Monte Carlo Estimated (red stars)'); % Plot pdf values of randomly generated points: subplot(1,2,2); [xx, yy] = meshgrid(Dimensions,1:NumOfRuns); surf(xx,yy,PDF_Values); shading interp; view(30,15); title('PDF Value at Randomly Generated Points'); xlabel('Dimension'); zlabel('Normal N-Dim PDF Value'); ylim([1 NumOfRuns]);
% 1) The low multidimensional normal pdf values happen because the following relation holds: % % n % N-dim_NormalPDF(X) = Prod(one-dim.normalpdf(x_i)) % i=1 % % This happens because the normal random variables x_i of this example are independent. % And this is the case because the covariance matrix sigma is diagonal. % 2) You may plot the fraction of datapoints in the body with red stars: % figure(2); % stem(Dimensions,counter_body/NumOfRuns,'r*'); % grid on; % xlabel('Dimension'); % ylabel('Fraction of Total Generated Samples in Body'); % xlim([Dimensions(1) Dimensions(end)]);