Plot a Probability Density Function in 3D
99 views (last 30 days)
Show older comments
I have datapoints for a sphere and I want to prove that the data is indeed random. As I cannot share the data file here, let's just say its
randn(6000,3).
I tried using the standard pdf function for 2D but I do not believe this is what I need.
What would be the best approach for a probability density function in 3D that would convince that the distribution is random? I appreciate any advice. Thank you.
2 Comments
the cyclist
on 30 Jan 2024
I am confused by what is stored in the array. Taking a tiny piece of it
rng default
data = randn(5,3)
What is represented by these values? What is expected to be random, and what specifically are you testing for?
I don't really understand the fact that these are datapoints "for a sphere". You have the physicial locations of the datapoints, and also some value related to those physical locations?
Torsten
on 30 Jan 2024
I guess OP wants to "prove" that the 3d points are uniformly distributed in a sphere of a certain radius.
Answers (2)
Hassaan
on 30 Jan 2024
Edited: Hassaan
on 30 Jan 2024
@Victor Carneiro da Cunha Martorelli As per my understanding:
Visual Inspection with 3D Scatter Plot: Visualize the data points in a 3D scatter plot to get a preliminary sense of their distribution. Ideally, for a 3D Gaussian, you should see the highest density of points around the mean and fewer as you move away.
3D Histogram or Heatmap:
Create a 3D histogram (or 2D heatmap for each pair of variables) to visualize the frequency of data points within specific volume elements (voxels).
Check for Isotropic Dispersion: For a truly random 3D Gaussian distribution, the dispersion should be isotropic (uniform in all directions from the mean). You could check this by computing the covariance matrix and ensuring it is diagonal (if the data is centered).
% Clear workspace, command window, and close all figures
clear
clc
close all
% Generate random data points from a 3D Gaussian distribution
data = randn(6000,3);
% 3D Scatter Plot
figure;
scatter3(data(:,1), data(:,2), data(:,3), '.');
title('3D Scatter Plot');
xlabel('X-axis');
ylabel('Y-axis');
zlabel('Z-axis');
grid on;
% Histograms for Each Dimension
figure;
subplot(3,1,1);
histogram(data(:,1));
title('Histogram along X-axis');
xlabel('X-axis');
ylabel('Frequency');
subplot(3,1,2);
histogram(data(:,2));
title('Histogram along Y-axis');
xlabel('Y-axis');
ylabel('Frequency');
subplot(3,1,3);
histogram(data(:,3));
title('Histogram along Z-axis');
xlabel('Z-axis');
ylabel('Frequency');
% 3D Kernel Density Estimation
% Define the grid for estimation
gridX = linspace(min(data(:,1)), max(data(:,1)), 30);
gridY = linspace(min(data(:,2)), max(data(:,2)), 30);
gridZ = linspace(min(data(:,3)), max(data(:,3)), 30);
[A, B, C] = ndgrid(gridX, gridY, gridZ);
gridPoints = [A(:), B(:), C(:)];
% Covariance Matrix
cov_matrix = cov(data);
disp('Covariance Matrix:');
disp(cov_matrix);
% Note: For multivariate normality tests and goodness-of-fit tests in 3D,
% MATLAB does not have built-in functions, and you would need to find a suitable toolbox or write custom code.
Multivariate Normality Test: Apply statistical tests for multivariate normality such as Mardia’s test, Henze-Zirkler’s test, or the Doornik-Hansen test.
Q-Q Plots for Each Dimension: Generate Q-Q plots for each of the three dimensions against the standard normal distribution.
Goodness-of-Fit Test: Conduct a Chi-Squared goodness-of-fit test in 3D. You can bin the data into a 3D histogram and compare the observed frequencies with the expected frequencies under a multivariate normal distribution with the same mean and covariance as your data.
Compare Marginal Distributions: Look at the marginal distributions along each axis. For a true 3D Gaussian, each axis should independently follow a normal distribution.
Ellipsoid Fitting: Fit an ellipsoid to the data points and see if it conforms to what would be expected for a Gaussian distribution (in terms of the relationship between the axes of the ellipsoid and the standard deviations of the Gaussian).
-----------------------------------------------------------------------------------------------------------------------------------------------------
If you find the solution helpful and it resolves your issue, it would be greatly appreciated if you could accept the answer. Also, leaving an upvote and a comment are also wonderful ways to provide feedback.
It's important to note that the advice and code are based on limited information and meant for educational purposes. Users should verify and adapt the code to their specific needs, ensuring compatibility and adherence to ethical standards.
Professional Interests
- Technical Services and Consulting
- Embedded Systems | Firmware Developement | Simulations
- Electrical and Electronics Engineering
Feel free to contact me.
0 Comments
William Rose
on 30 Jan 2024
Edited: William Rose
on 30 Jan 2024
For a one-dimensional random variable, you could do the Kolmogorov-Smirnov test, or the Anderson-Darling test, or the chi-suared goodness-of-fit test.
But you have a joint distribution involving three dimensions. The extension of the K-S test to two dimensions is considered here and also here. Maybe you can extend it to three dimensions.
Depending on your audience and goals, you can use the one dimensional tests mentioned above to support or refute claims about the distribution of the data.
For example, you can test the x, y, and z data independently, to see if they are consistent with a particular distribution. You can also do a chi-squared test for independence of the variables.
rng(1); % seed random number generator for reproroducibility
N=6000; points=randn(N,3);
x=points(:,1); y=points(:,2); z=points(:,3);
[~,px]=adtest(x,Distribution='norm')
The Anderson-Darling test indicates the x data IS from a normal distribution, with probability p>0.4. You can do likewise for y, z.
You can test for linear correlation among variables as follows.
R=corrcoef(x,y); rho=R(1,2); % rho=Pearson correlation between x, y
tstat=rho*sqrt((N-2)/(1-rho*2)); % will be ~t(N-2) if x,y independent
p=tcdf(tstat,N-2) % p shouldn't be too small or too large if x,y independent
if p<.025 || p>.975,
fprintf('Reject the null hypothesis that x,y are independent.\n')
else
fprintf('Do not reject the null hypothesis of indendence.\n');
end
You can do likewise for y,z and for z,x. The test above is only for linear correlation of x,y. The variables could be correlated in a nonlinear way that would not be detected by the test above.
If the data are spherical, then the variances in the three directions should be equal. You can test for the equality of the variances of the columns of an array with vartestn().
vartestn(points)
The p value >.05 is evidence in favor of the hypothesis that the variances are in fact equal in the x, y, and z directions.
0 Comments
See Also
Categories
Find more on Hypothesis Tests in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!