Documentation

# paretotails

Piecewise distribution with Pareto tails

## Description

A `paretotails` object is a piecewise distribution with generalized Pareto distributions (GPDs) in the tails.

A `paretotails` object consists of one or two GPDs in the tails and another distribution in the center. You can specify the distribution type for the center by using the `cdffun` argument of `paretotails` when you create an object. Valid values are `'ecdf'`, `'kernel'`, and a function handle.

`paretotails` fits a distribution of type `cdffun` to the observations (`x`) and finds the quantiles corresponding to the lower and upper tail cumulative probabilities (`pl` and `pu`, respectively). Then, `paretotails` fits two GPDs to the lower `100*pl` percent of the observations and the upper `100*(1–pu)` percent of the observations, respectively. If `x` does not have at least two distinct observations in a tail, then `paretotails` does not create the corresponding tail segment.

Use the object functions `boundary`, `segment`, `upperparams`, and `lowerparams` to find distribution characteristics. `lowerparams` and `upperparams` return the parameters of the GPDs in the tails. `boundary` returns the boundary points between piecewise distribution segments, `segment` returns the segment of a piecewise distribution containing input values, and `nsegments` returns the number of segments in an object.

Use the object functions `cdf`, `icdf`, `pdf`, and `random` to evaluate the distribution. These functions are well suited to copula and other Monte Carlo simulations. `pdf` returns the GPD density in the tails and the slope of the cumulative distribution function (cdf) in the center. These probability density function (pdf) values in the center are generally not good estimates of the underlying density of the original data.

## Creation

Create a piecewise distribution object using `paretotails`.

### Syntax

``pd = paretotails(x,pl,pu)``
``pd = paretotails(x,pl,pu,cdffun)``

### Description

example

````pd = paretotails(x,pl,pu)` returns the piecewise distribution object `pd`, which consists of the empirical distribution in the center and generalized Pareto distributions in the tails. Specify the boundaries of the tails using the lower and upper tail cumulative probabilities `pl` and `pu`, respectively.```

example

````pd = paretotails(x,pl,pu,cdffun)` specifies the type of center distribution segment using `cdffun`.```

### Input Arguments

expand all

Input data, specified as a numeric vector.

Data Types: `double`

Lower tail cumulative probability, specified as a numeric scalar in the range `[0,1]`. The quantile of `pl` is the boundary of the lower tail observations.

If `pl` is `0` or `x` does not have at least two distinct observations in the lower tail, then `paretotails` divides the input data in `x` into two groups, center and upper tail. In this case, the fitted piecewise distribution object `pd` consists of two segments: the empirical distribution in the center and GPD in the upper tail.

Example: `0.1`

Data Types: `single` | `double`

Upper tail cumulative probability, specified as a numeric scalar in the range `[0,1]`. The quantile of `pu` is the boundary of the upper tail observations.

If `pu` is `1` or `x` does not have at least two distinct observations in the upper tail, then `paretotails` divides the input data in `x` into two groups, center and lower tail. In this case, the fitted piecewise distribution object `pd` consists of two segments: the empirical distribution in the center and GPD in the lower tail.

Example: `0.9`

Data Types: `single` | `double`

Type of center distribution segment, specified as `'ecdf'`, `'kernel'`, or a function handle.

ValueDescription
`'ecdf'`

Interpolated empirical cdf.

`paretotails` uses values in `x` as the midpoints in the vertical steps of the empirical cdf, and computes the estimates for the points between the values in `x` by linear interpolation. For details about how to find the interpolated empirical cdf, see A Piecewise Linear Nonparametric CDF Estimate.

`'kernel'`

Interpolated kernel smoothing estimate of the cdf.

`paretotails` uses the `ksdensity` function to find cdf estimates for 100 points in the range of `x`, and uses linear interpolation to compute the estimates for the points between the 100 points.

`'kernel'` is equivalent to specifying a function handle ```fun = @(x)ksdensity(x,'function','cdf');```.

function handle

Interpolated estimates using a specified function.

`paretotails` uses a handle to a function of the form `[p,xi] = fun(x)` that accepts the input data vector `x` and returns a vector `p` of cdf values and a vector `xi` of evaluation points. Values in `xi` must be sorted and distinct but do not have to equal the values in `x`. The `paretotails` function computes the cdf estimates for the points between the values in `xi` by linear interpolation.

`paretotails` uses `cdffun` to compute the quantiles corresponding to `pl` and `pu`.

Example: `'kernel'`

## Properties

expand all

Number of segments, including the center segment and tail segments in a `paretotail` object, specified as a scalar. `NumSegments` is 3, 2, or 1 if the number of the tail segments in the object is 2, 1, or 0, respectively.

Data Types: `double`

Lower tail GPD parameters, fit to the lower extreme observations in `x`, specified as a numeric vector. The first value is the shape parameter and the second value is the scale parameter of the GPD.

The location parameter of the lower tail GPD is equal to the quantile of `pl`. Use the `boundary` function to return the location parameter. For example, run ```[p,q] = boundary(pd)```, where `pd` is a `paretotails` object. `q(1)` is the location parameter.

Data Types: `single` | `double`

Upper tail GPD parameters, fit to the upper extreme observations in `x`, specified as a numeric vector. The first value is the shape parameter and the second value is the scale parameter of the GPD.

The location parameter of the upper tail GPD is equal to the quantile of `pu`. Use the `boundary` function to return the location parameter. For example, run ```[p,q] = boundary(pd)```, where `pd` is a `paretotails` object. `q(2)` is the location parameter.

Data Types: `single` | `double`

## Object Functions

 `boundary` Piecewise distribution boundaries `cdf` Cumulative distribution function `icdf` Inverse cumulative distribution function `lowerparams` Lower Pareto tail parameters `nsegments` Number of segments in piecewise distribution `pdf` Probability density function `random` Random numbers `segment` Piecewise distribution segments containing input values `upperparams` Upper Pareto tail parameters

## Examples

collapse all

Generate a sample data set and fit a piecewise distribution with Pareto tails to the data. Specify an empirical distribution for the center by using `paretotails` with its default settings.

Generate a sample data set containing 100 random numbers from a t distribution with 3 degrees of freedom.

```rng('default'); % For reproducibility t = trnd(3,100,1);```

Create a `paretotails` object by fitting a piecewise distribution to `t`. Specify the boundaries of the tails using the lower and upper tail cumulative probabilities so that a fitted object consists of the empirical distribution for the middle 80% of the data set and GPDs for the lower and upper 10% of the data set.

`pd = paretotails(t,0.1,0.9)`
```pd = Piecewise distribution with 3 segments -Inf < x < -1.84875 (0 < p < 0.1): lower tail, GPD(0.183032,1.00347) -1.84875 < x < 2.07662 (0.1 < p < 0.9): interpolated empirical cdf 2.07662 < x < Inf (0.9 < p < 1): upper tail, GPD(0.333239,1.19705) ```

Each line of the object display shows the summary of each segment, including the GPD parameters (shape and scale parameters) and the boundary values in the quantiles and cumulative probabilities. Use the object functions `boundary`, `lowerparams`, and `upperparams` to return these values.

You can use the `nsegments` function to return the number of segments and the `segment` function to return the segment that contains input values.

You can also use the distribution functions `cdf`, `icdf`, `pdf`, and `random` to evaluate the distribution and generate random samples.

Plot the cdf of the t distribution and the cdf of the `paretotails` object on the same figure.

```x = linspace(-5,5); plot(x,tcdf(x,3),'r--') hold on plot(x,cdf(pd,x),'b-')```

Find the boundary points between the segments of the `paretotails` object by using `boundary`, and mark the points on the figure.

```[p,q] = boundary(pd); plot(q,p,'bo') legend('t Distribution','Pareto Tails Object','Boundary Points','Location','best') hold off``` Generate a sample data set and fit a piecewise distribution with Pareto tails to the data. Fit a center segment by using `paretotails` with a function handle.

Generate a sample data set containing 20% outliers.

```rng('default'); % For reproducibility left_tail = -exprnd(1,100,1); right_tail = exprnd(5,100,1); center = randn(800,1); x = [left_tail;center;right_tail];```

Define a function handle using `ksdensity` to specify a nondefault value of the bandwidth.

`myfun1 = @(x)ksdensity(x,'Bandwidth',.1,'Function','cdf');`

Create a `paretotails` object by fitting a piecewise distribution with the specified kernel smoothing estimator to `x`. Specify the boundaries of the tails using the lower and upper tail cumulative probabilities so that a fitted object consists of the kernel estimator for the middle 80% of the data set and GPDs for the lower and upper 10% of the data set.

`pd1 = paretotails(x,0.1,0.9,myfun1)`
```pd1 = Piecewise distribution with 3 segments -Inf < x < -1.35204 (0 < p < 0.1): lower tail, GPD(0.0104112,0.54947) -1.35204 < x < 1.80824 (0.1 < p < 0.9): function: @(x)ksdensity(x,'Bandwidth',.1,'Function','cdf') 1.80824 < x < Inf (0.9 < p < 1): upper tail, GPD(0.227542,3.10586) ```

You can also use a parametric distribution for the center segment. Define a function that fits a normal distribution to data and returns the cdf values, and pass the function handle when you create a `paretotails` object.

`pd2 = paretotails(x,0.1,0.9,@myfun2)`
```pd2 = Piecewise distribution with 3 segments -Inf < x < -2.70875 (0 < p < 0.1): lower tail, GPD(-0.358104,0.831855) -2.70875 < x < 3.52195 (0.1 < p < 0.9): function: myfun2 3.52195 < x < Inf (0.9 < p < 1): upper tail, GPD(-0.0661815,5.04694) ```
```function [p,xi] = myfun2(x) pd = fitdist(x,'Normal'); xi = linspace(min(x),max(x),length(x)*2); p = cdf(pd,xi); end```