Pearson Distribution

The Pearson distribution is a four-parameter distribution that has an arbitrary mean, standard deviation, skewness, and kurtosis. This distribution is often used to model asymmetric data that is prone to outliers.

Statistics and Machine Learning Toolbox™ offers two ways to work with the Pearson distribution:

Use distribution-specific functions (pearspdf, pearscdf, pearsrnd) with specified distribution parameters. The distribution-specific functions can accept parameters of multiple Pearson distributions.
Use generic distribution functions (cdf, pdf, random) with the distribution name "Pearson" and specified distribution parameters.

Types

The Pearson distribution has eight types, most of which correspond to other known distributions.

Pearson Distribution Type	Description
`0`	Normal
`1`	4-parameter beta
`2`	Symmetric 4-parameter beta
`3`	3-parameter gamma
`4`	Distribution specific to the Pearson system with pdf proportional to ${(1 + {(\frac{x - μ}{σ})}^{2})}^{- a} \exp (- b \arctan (\frac{x - μ}{σ}))$ , where a and b are quantities related to the differential equation that defines the Pearson distribution
`5`	Inverse 3-parameter gamma
`6`	F location scale
`7`	Student's t location scale

Parameters

The Pearson distribution uses the following parameters.

Parameter	Description
μ	Mean
σ	Standard deviation
γ	Skewness. γ is a measure of the asymmetry of the data around the sample mean. If the skewness is negative, the data spreads out more to the left of the mean than to the right. If the skewness is positive, the data spreads out more to the right. γ² must be less than κ – 1.
κ	Kurtosis. κ is a measure of how prone a distribution is to outliers. The kurtosis of the normal distribution is 3. Distributions that are more prone to outliers than the normal distribution have a kurtosis value greater than 3; distributions that are less prone have a kurtosis value less than 3. κ must be greater than γ² + 1.

Probability Density Function

The Pearson distribution probability density function (pdf) is the solution to the differential equation

$\frac{p' (x)}{p (x)} = - \frac{a + (x - μ)}{b_{0} + b_{1} (x - μ) + b_{2} {(x - μ)}^{2}},$

where the system is defined by the coefficients $b_{j}$ for 1 ≤ j ≤ 3. For most distribution types, the pdf is a closed-form function. The following table describes the pdf for each distribution type.

Pearson Distribution Type	pdf p(x)
`0`	$\frac{1}{σ \sqrt{2 π}} e^{\frac{- {(x - μ)}^{2}}{2 σ^{2}}}$
`1`	$\frac{{(x - l b)}^{a - 1} {(u b - x)}^{b - 1}}{B (a, b) {(u b - l b)}^{a + b - 1}}$ , where B is the Beta Function, lb and ub are the lower and upper bounds of the distribution (respectively), a > 0 is a shape parameter, and b > 0 is a scale parameter
`2`	$\frac{{(x + u b)}^{a - 1} {(u b - x)}^{b - 1}}{B (a, b) {(2 u b)}^{a + b - 1}}$
`3`	$\frac{1}{b^{a} Γ (a)} {(x - l b)}^{a - 1} e^{- \frac{x - l b}{b}}$ , where Γ is the Gamma Function
`4`	$\frac{{\| \frac{Γ (m + \frac{ν}{2} i)}{Γ (m)} \|}^{2}}{σ B (m - \frac{1}{2}, \frac{1}{2})} {[1 + u^{2}]}^{- m} \exp [- ν \arctan (u)], u = \frac{x - μ}{σ}$ , where m > 0 and ν > 0 are shape parameters
`5`	$\frac{b^{a} e^{\frac{- b}{u}}}{σ u^{a + 1} Γ (a)}, u = \frac{x - μ}{σ}$
`6`	$\frac{1}{σ} \frac{Γ [\frac{(ν_{1} + ν_{2})}{2}]}{Γ (\frac{ν_{1}}{2}) Γ (\frac{ν_{2}}{2})} {(\frac{ν_{1}}{ν_{2}})}^{^{\frac{ν_{1}}{2}}} \frac{u^{\frac{ν_{1} - 2}{2}}}{{[1 + (\frac{ν_{1}}{ν_{2}}) u]}^{\frac{(ν_{1} + ν_{2})}{2}}}, u = \frac{x - μ}{σ}$ , where ν₁ > 0 and ν₂ > 0 are shape parameters
`7`	$\frac{Γ (\frac{ν + 1}{2})}{σ \sqrt{ν π} Γ (\frac{ν}{2})} {[\frac{ν + u^{2}}{ν}]}^{- (\frac{ν + 1}{2})}, u = \frac{x - μ}{σ}$

Cumulative Distribution Function

The Pearson distribution cumulative distribution function (cdf) is the integral of the pdf. The following table describes the cdf for each distribution type.

Pearson Distribution Type	cdf c(x)
`0`	$\frac{1}{σ \sqrt{2 π}} \int_{- \infty}^{x} e^{\frac{- {(t - μ)}^{2}}{2 σ^{2}}} d t$
`1`	$\frac{1}{B (a, b) {(u b - l b)}^{a + b - 1}} \int_{l b}^{x} {(t - l b)}^{a - 1} {(u b - t)}^{b - 1} d t$ , where B is the Beta Function, lb and ub are the lower and upper bounds of the distribution (respectively), a > 0 is a shape parameter, and b > 0 is a scale parameter
`2`	$\frac{1}{B (a, b) {(2 u b)}^{a + b - 1}} \int_{- u b}^{x} {(t + u b)}^{a - 1} {(u b - t)}^{b - 1} d t$
`3`	$\frac{1}{b Γ (a)} \int_{l b}^{x} {(t - l b)}^{a - 1} e^{- \frac{t - l b}{b}} d t$
`4`	A type 4 Pearson distribution does not have a closed-form cdf. You can evaluate the type 4 Pearson distribution cdf at a point x by numerically integrating the pdf from –∞ to x.
`5`	$Q (a, \frac{b}{u}), u = \frac{x - μ}{σ}$ , where Q is the Incomplete Gamma Function
`6`	$I_{ν_{1} u / (ν_{1} u + ν_{2})} (\frac{ν_{1}}{2}, \frac{ν_{2}}{2}), u = \frac{x - μ}{σ}$ , where I is the regularized incomplete beta function, and ν₁ > 0 and ν₂ > 0 are shape parameters
`7`	$\int_{- \infty}^{x} \frac{Γ (\frac{ν + 1}{2})}{Γ (\frac{ν}{2})} \frac{1}{σ \sqrt{ν π}} \frac{1}{{(1 + \frac{t^{2}}{ν})}^{\frac{ν + 1}{2}}} d t$ , where ν > 0 is a shape parameter

Support

For some Pearson distribution types, support for the pdf and cdf is given by the coefficients $b_{j}$ in the differential equation that defines the pdf. The following table shows the support for the Pearson distribution pdf and cdf when μ = 0 and σ = 1. The variables a1 and a2 are solutions to the equation $b_{0} + b_{1} (x - μ) + b_{2} {(x - μ)}^{2} = 0$ , and a1 < a2.

Pearson Distribution Type	Support
`0`	`(-Inf,Inf)`
`1`	`(a1,a2)`
`2`	`(-a1,a1)`
`3`	`(a1,Inf)` when `a>0` and `(-Inf,a1)` when `a<0`
`4`	`(-Inf,Inf)`
`5`	`(-C1,Inf)` when `(b1-C1)/b2 <0`, and `(-Inf,C1)` otherwise. `C1 = b1/(2*b2)`.
`6`	`(a2,Inf)` when `a1` and `a2` are negative, and `(-Inf,a1)` when `a1` and `a2` are positive
`7`	`(-Inf,Inf)`

For distributions with μ ≠ 0 or σ ≠ 1, the bounds of the support are shifted from the bounds given in the preceding table. In this case, you can calculate the lower and upper bounds lb and ub as follows:

lb = σlb^*+μ
ub = σub^*+μ

where lb^* and ub^* are the lower and upper bounds given in the preceding table for the same distribution type.

Examples

Compare Pearson Distributions

Open Live Script

Create the variables mu0, sigma0, skew0, and kurt0, which contain the parameters for a Pearson distribution of type 0.

mu0 = 0;
sigma0 = 1;
skew0 = 0;
kurt0 = 3;

Use the pearspdf and pearscdf functions to evaluate the pdf and cdf, respectively, for the type 0 Pearson distribution between –5 and 5. You can create a vector of points between –5 and 5 by using the linspace function. Confirm that mu0, sigma0, skew0, and kurt0 define a Pearson distribution of type 0.

x0 = linspace(-5,5,100);
[p0,type0] = pearspdf(x0,mu0,sigma0,skew0,kurt0);
c0 = pearscdf(x0,mu0,sigma0,skew0,kurt0);
type0

type0 = 
0

The output shows that p0 contains the pdf for a Pearson distribution of type 0, which is the standard normal distribution.

Draw a random sample of points from the distribution by using the pearsrnd function.

rng(0,"twister") % For reproducibility
r0 = pearsrnd(mu0,sigma0,skew0,kurt0,[100,1]);

Repeat the process for a Pearson distribution of type 4. Define the variables mu4, sigma4, skew4, and kurt4. Evaluate the pdf and cdf between –5 and 15, and draw a random sample from the distribution.

mu4 = 5;
sigma4 = 1;
skew4 = 1;
kurt4 = 10;
x4 = linspace(-5,15,100);
[p4,type4] = pearspdf(x4,mu4,sigma4,skew4,kurt4);
c4 = pearscdf(x4,mu4,sigma4,skew4,kurt4);
r4 = pearsrnd(mu4,sigma4,skew4,kurt4,[100,1]);

Confirm that mu4, sigma4, skew4, and kurt4 define a Pearson distribution of type 4.

type4

type4 = 
4

Repeat the process for a Pearson distribution of type 6, evaluating the pdf and cdf between –10 and 10.

mu6 = 0;
sigma6 = 5;
skew6 = 3;
kurt6 = 20;
x6 = linspace(-10,10,100);
[p6,type6] = pearspdf(x6,mu6,sigma6,skew6,kurt6);
c6 = pearscdf(x6,mu6,sigma6,skew6,kurt6);
r6 = pearsrnd(mu6,sigma6,skew6,kurt6,[100,1]);

Confirm that mu6, sigma6, skew6, and kurt6 define a Pearson distribution of type 6.

type6

type6 = 
6

Use the tiledlayout and nexttile functions to display box plots of the random samples, pdfs, and cdfs for the Pearson distributions of type 0, 4 and 6. Create box plots of the random samples using the boxchart function.

tiledlayout(3,3)
nexttile
boxchart(r0)
title("Random Sample")
ylabel("Type 0",FontWeight="bold")
nexttile
plot(x0,p0)
title("PDF")
nexttile
plot(x0,c0)
title("CDF")
nexttile
boxchart(r4)
ylabel("Type 4",FontWeight="bold")
nexttile
plot(x4,p4)
nexttile
plot(x4,c4)
nexttile
boxchart(r6)
ylabel("Type 6",FontWeight="bold")
nexttile
plot(x6,p6)
nexttile
plot(x6,c6)

The rows of the figure correspond to the three Pearson distribution types. The first column contains a box plot of the random samples for each distribution. The type 6 Pearson distribution has the largest number of outliers, which is consistent with it having the largest kurtosis of the three distributions. The second column contains a plot of the pdf for each distribution. The pdfs for the type 0 and type 4 Pearson distributions are unbounded, and the type 6 Pearson distribution has a lower bound. The third column shows a plot of the cdf for each distribution. The type 0 and type 4 Pearson distribution cdfs are similarly S-shaped because their pdfs have similar shapes. The type 6 Pearson distribution cdf is concave for values greater than the lower bound.

To calculate the type 6 Pearson distribution lower bound, return the coefficients of the polynomial in the denominator of the ordinary differential equation that defines the Pearson distribution pdf. For more information, see Probability Density Function and Support.

[~,~,coefs6] = pearspdf([],mu6,sigma6,skew6,kurt6)

coefs6 = 1×3

    0.7162    0.9324    0.0946

From left to right, the coefficients correspond to terms of increasing order.

Find the roots of the polynomial function by using the roots function. Use the fliplr function to format coefs6 so that, from left to right, the coefficients correspond to terms of decreasing order.

coefs6 = fliplr(coefs6);
roots6 = roots(coefs6)

roots6 = 2×1

   -9.0175
   -0.8396

The roots of the polynomial are negative, indicating that the type 6 Pearson pdf has a lower bound.

To calculate the lower bound, multiply the largest root by sigma6 and add the result to mu6.

lb6 = sigma6*max(roots6) + mu6

lb6 = 
-4.1982

The lower bound for the support of the type 6 Pearson distribution pdf is near –4, which is consistent with the plot of the pdf.

References

[1] Johnson, Norman Lloyd, et al. "Continuous Univariate Distributions." 2nd ed, vol. 1, Wiley, 1994.

[2] Willink, R. "A Closed-Form Expression for the Pearson Type IV Distribution Function." Australian & New Zealand Journal of Statistics, vol. 50, no. 2, June 2008, pp. 199–205. https://onlinelibrary.wiley.com/doi/10.1111/j.1467-842X.2008.00508.x