mvksdensity
Kernel smoothing function estimate for multivariate data
Description
computes a probability density estimate of the sample data in the
nbyd matrix f
= mvksdensity(x
,pts
,'Bandwidth',bw
)x
,
evaluated at the points in pts
using the required namevalue
pair argument value bw
for the bandwidth value. The estimation
is based on a product Gaussian kernel function.
For univariate or bivariate data, use ksdensity
instead.
returns any of the previous output arguments, using additional options specified by
one or more f
= mvksdensity(x
,pts
,'Bandwidth',bw
,Name,Value
)Name,Value
pair arguments. For example, you can
define the function type that mvksdensity
evaluates, such as
probability density, cumulative probability, or survivor function. You can also
assign weights to the input values.
Examples
Estimate Multivariate Kernel Density
Load the Hald cement data.
load hald
The data measures the heat of hardening for 13 different cement compositions. The predictor matrix ingredients
contains the percent composition for each of four cement ingredients. The response matrix heat
contains the heat of hardening (in cal\g) after 180 days.
Estimate the kernel density for the first three observations in ingredients
.
xi = ingredients(1:3,:);
f = mvksdensity(ingredients,xi,'Bandwidth',0.8);
Estimate Multivariate Kernel Density Using Grids
Load the Hald cement data.
load hald
The data measures the heat of hardening for 13 different cement compositions. The predictor matrix ingredients
contains the percent composition for each of four cement ingredients. The response matrix heat
contains the heat of hardening (in cal/g) after 180 days.
Create an array of points at which to estimate the density. First, define the range and spacing for each variable, using a similar number of points in each dimension.
gridx1 = 0:2:22; gridx2 = 20:5:80; gridx3 = 0:2:24; gridx4 = 5:5:65;
Next, use ndgrid
to generate a full grid of points using the defined range and spacing.
[x1,x2,x3,x4] = ndgrid(gridx1,gridx2,gridx3,gridx4);
Finally, transform and concatenate to create an array that contains the points at which to estimate the density. This array has one column for each variable.
x1 = x1(:,:)'; x2 = x2(:,:)'; x3 = x3(:,:)'; x4 = x4(:,:)'; xi = [x1(:) x2(:) x3(:) x4(:)];
Estimate the density.
f = mvksdensity(ingredients,xi,... 'Bandwidth',[4.0579 10.7345 4.4185 11.5466],... 'Kernel','normpdf');
View the size of xi
and f
to confirm that mvksdensity
calculates the density at each point in xi
.
size_xi = size(xi)
size_xi = 1×2
26364 4
size_f = size(f)
size_f = 1×2
26364 1
Input Arguments
x
— Sample data
numeric matrix
Sample data for which mvksdensity
returns the probability density
estimate, specified as an nbyd
matrix of numeric values. n is the number of data points
(rows) in x
, and d is the number of
dimensions (columns).
Data Types: single
 double
bw
— Value for the bandwidth of the kernel smoothing window
scalar value  delement vector
Value for the bandwidth of the kernelsmoothing window, specified as a scalar value or
delement vector. d is the number
of dimensions (columns) in the sample data x
. If
bw
is a scalar value, it applies to all
dimensions.
If you specify 'BoundaryCorrection'
as
'log'
(default) and 'Support'
as
either 'positive'
or a tworow matrix,
mvksdensity
converts bounded data to be unbounded
by using log transformation. The value of bw
is on the
scale of the transformed values.
Silverman's rule of thumb for the bandwidth is
$${b}_{i}={\sigma}_{i}{\left\{\frac{4}{\left(d+2\right)n}\right\}}^{\raisebox{1ex}{$1$}\!\left/ \!\raisebox{1ex}{$\left(d+4\right)$}\right.},\text{\hspace{1em}}i=1,2,\mathrm{...},d,$$
where d is the number of dimensions, n is the number of observations, and $${\sigma}_{i}$$ is the standard deviation of the i^{th} variate [4].
Example: 'Bandwidth',0.8
Data Types: single
 double
NameValue Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Namevalue arguments must appear after other arguments, but the order of the
pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose
Name
in quotes.
Example: 'Kernel','triangle','Function,'cdf'
specifies
that mvksdensity
estimates the cdf of the sample
data using the triangle kernel function.
BoundaryCorrection
— Boundary correction method
'log' (default)  'reflection'
Boundary correction method, specified as the commaseparated pair
consisting of 'BoundaryCorrection'
and either
'log'
or 'reflection'
.
Value  Description 

'log' 
The value of 
'reflection' 

mvksdensity
applies boundary correction only when
you specify 'Support'
as a value other than
'unbounded'
.
Example: 'BoundaryCorrection','reflection'
Function
— Function to estimate
'pdf'
(default)  'cdf'
 'survivor'
Function to estimate, specified as the commaseparated pair
consisting of 'Function'
and one of the following.
Value  Description 

'pdf'  Probability density function 
'cdf'  Cumulative distribution function 
'survivor'  Survivor function 
Example: 'Function'
,'cdf'
Kernel
— Type of kernel smoother
'normal'
(default)  'box'
 'triangle'
 'epanechnikov'
 function handle  character vector  string scalar
Type of kernel smoother, specified as the commaseparated pair
consisting of 'Kernel'
and one of the following.
Value  Description 

'normal'  Normal (Gaussian) kernel 
'box'  Box kernel 
'triangle'  Triangular kernel 
'epanechnikov'  Epanechnikov kernel 
You can also specify a kernel function that is a custom or builtin function. Specify the
function as a function handle (for example,
@myfunction
or @normpdf
) or as
a character vector or string scalar (for example,
'myfunction'
or 'normpdf'
).
The software calls the specified function with one argument that is an
array of distances between data values and locations where the density
is evaluated, normalized by the bandwidth in that dimension. The
function must return an array of the same size containing the
corresponding values of the kernel function.
mvksdensity
applies the same kernel to
each dimension.
Example: 'Kernel','box'
Support
— Support for the density
'unbounded'
(default)  'positive'
 2byd matrix
Support for the density, specified as the commaseparated pair
consisting of 'support'
and one of the following.
Value  Description 

'unbounded'  Allow the density to extend over the whole real line 
'positive'  Restrict the density to positive values 
2byd matrix  Specify the finite lower and upper bounds for the support of
the density. The first row contains the lower limits and the second
row contains the upper limits. Each column contains the limits for
one dimension of x . 
'Support'
can also be a combination of positive, unbounded, and bounded
variables specified as [0 Inf L; Inf Inf U]
.
Example: 'Support','positive'
Data Types: single
 double
 char
 string
Weights
— Weights for sample data
vector
Weights for sample data, specified as the commaseparated pair consisting of
'Weights'
and a vector of length size(x,1)
,
where x
is the sample data.
Example: 'Weights',xw
Data Types: single
 double
Output Arguments
f
— Estimated function values
vector
Estimated function values, returned as a vector. f
and
pts
have the same number of rows.
More About
Multivariate Kernel Distribution
A multivariate kernel distribution is a nonparametric representation of the probability density function (pdf) of a random vector. You can use a kernel distribution when a parametric distribution cannot properly describe the data, or when you want to avoid making assumptions about the distribution of the data. A multivariate kernel distribution is defined by a smoothing function and a bandwidth matrix, which control the smoothness of the resulting density curve.
The multivariate kernel density estimator is the estimated pdf of a random vector. Let x = (x_{1}, x_{2}, …, x_{d})' be a ddimensional random vector with a density function f and let y_{i} = (y_{i1}, y_{i2}, …, y_{id})' be a random sample drawn from f for i = 1, 2, …, n, where n is the number of random samples. For any real vectors of x, the multivariate kernel density estimator is given by
$${\widehat{f}}_{H}\left(x\right)=\frac{1}{n}{\displaystyle \sum _{i=1}^{n}{K}_{H}\left(x{y}_{i}\right)},$$
where $${K}_{H}\left(x\right)={\leftH\right}^{1/2}K({H}^{1/2}x)$$, $$K(\xb7)$$ is the kernel smoothing function, and H is the dbyd bandwidth matrix.
mvksdensity
uses a diagonal bandwidth matrix and a product
kernel. That is, H^{1/2} is a square
diagonal matrix with the elements of vector (h_{1},
h_{2}, …,
h_{d}) on the main diagonal. K(x) takes the product
form K(x) = k(x_{1})k(x_{2})
⋯k(x_{d}), where $$k(\xb7)$$ is a onedimensional kernel smoothing function. Then, the
multivariate kernel density estimator becomes
$${\widehat{f}}_{H}\left(x\right)=\frac{1}{n}{\displaystyle \sum _{i=1}^{n}{K}_{H}\left(x{y}_{i}\right)}=\frac{1}{n{h}_{1}{h}_{2}\cdots {h}_{d}}{\displaystyle \sum _{i=1}^{n}K\left(\frac{{x}_{1}{y}_{i1}}{{h}_{1}},\frac{{x}_{2}{y}_{i2}}{{h}_{2}},\cdots ,\frac{{x}_{d}{y}_{id}}{{h}_{d}}\right)}=\frac{1}{n{h}_{1}{h}_{2}\cdots {h}_{d}}{\displaystyle \sum _{i=1}^{n}{\displaystyle \prod _{j=1}^{d}k\left(\frac{{x}_{j}{y}_{ij}}{{h}_{j}}\right)}}.$$
The kernel estimator for the cumulative distribution function (cdf), for any real vectors of x, is given by
$${\widehat{F}}_{H}\left(x\right)={\displaystyle {\int}_{\infty}^{{x}_{1}}{\displaystyle {\int}_{\infty}^{{x}_{2}}\cdots {\displaystyle {\int}_{\infty}^{{x}_{d}}{\widehat{f}}_{H}(t)d{t}_{d}\cdots}d{t}_{2}}d{t}_{1}}=\frac{1}{n}{\displaystyle \sum _{i=1}^{n}{\displaystyle \prod _{j=1}^{d}G\left(\frac{{x}_{j}{y}_{ij}}{{h}_{j}}\right)}}\text{\hspace{0.17em}},$$
where $$G({x}_{j})={\displaystyle {\int}_{\infty}^{{x}_{j}}k({t}_{j})d{t}_{j}}$$.
Reflection Method
The reflection method is a boundary correction method that
accurately finds kernel density estimators when a random variable has bounded
support. If you specify 'BoundaryCorrection','reflection'
,
mvksdensity
uses the reflection method.
If you additionally specify 'Support'
as a tworow matrix
consisting of the lower and upper limits for each dimension, then
mvksdensity
finds the kernel estimator as follows.
If
'Function'
is'pdf'
, then the kernel density estimator is$${\widehat{f}}_{H}\left(x\right)=\frac{1}{n{h}_{1}{h}_{2}\cdots {h}_{d}}{\displaystyle \sum _{i=1}^{n}{\displaystyle \prod _{j=1}^{d}\left[k\left(\frac{{x}_{j}{y}_{ij}^{}}{{h}_{j}}\right)+k\left(\frac{{x}_{j}{y}_{ij}}{{h}_{j}}\right)+k\left(\frac{{x}_{j}{y}_{ij}^{+}}{{h}_{j}}\right)\right]}}$$ for L_{j} ≤ x_{j} ≤ U_{j},
where $${y}_{ij}^{}=2{L}_{j}{y}_{ij}$$, $${y}_{ij}^{+}=2{U}_{j}{y}_{ij}$$, and y_{ij} is the
j
th element of thei
th sample data corresponding tox(i,j)
of the input argumentx
. L_{j} and U_{j} are the lower and upper limits of thej
th dimension, respectively.If
'Function'
is'cdf'
, then the kernel estimator for cdf is$${\widehat{F}}_{H}(x)=\frac{1}{n}{\displaystyle \sum _{i=1}^{n}{\displaystyle \prod _{j=1}^{d}\left[G\left(\frac{{x}_{j}{y}_{ij}^{}}{{h}_{j}}\right)+G\left(\frac{{x}_{j}{y}_{ij}}{{h}_{j}}\right)+G\left(\frac{{x}_{j}{y}_{ij}^{+}}{{h}_{j}}\right)G\left(\frac{{L}_{j}{y}_{ij}^{}}{{h}_{j}}\right)G\left(\frac{{L}_{j}{y}_{ij}}{{h}_{j}}\right)G\left(\frac{{L}_{j}{y}_{ij}^{+}}{{h}_{j}}\right)\right]}}$$ for L_{j} ≤ x_{j} ≤ U_{j}.
To obtain a kernel estimator for a survivor function (when
'Function'
is'survivor'
),mvksdensity
uses both $${\widehat{f}}_{H}(x)$$ and $${\widehat{F}}_{H}(x)$$.
If you additionally specify 'Support'
as
'positive'
or a matrix including [0 inf]
,
then mvksdensity
finds the kernel density estimator by
replacing [L_{j}
U_{j}]
with [0
inf]
in the above equations.
References
[1] Bowman, A. W., and A. Azzalini. Applied Smoothing Techniques for Data Analysis. New York: Oxford University Press Inc., 1997.
[2] Hill, P. D. “Kernel estimation of a distribution function.” Communications in Statistics – Theory and Methods. Vol. 14, Issue 3, 1985, pp. 605620.
[3] Jones, M. C. “Simple boundary correction for kernel density estimation.” Statistics and Computing. Vol. 3, Issue 3, 1993, pp. 135146.
[4] Silverman, B. W. Density Estimation for Statistics and Data Analysis. Chapman & Hall/CRC, 1986.
[5] Scott, D. W. Multivariate Density Estimation: Theory, Practice, and Visualization. John Wiley & Sons, 2015.
Extended Capabilities
C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.
Usage notes and limitations:
Names in namevalue pair arguments, including
'Bandwidth'
, must be compiletime constants.Values in the following namevalue pair arguments must also be compiletime constants:
'BoundaryCorrection'
,'Function'
, and'Kernel'
. For example, to use the'Function','cdf'
namevalue pair argument in the generated code, include{coder.Constant('Function'),coder.Constant('cdf')}
in theargs
value ofcodegen
.The value of the
'Kernel'
namevalue pair argument cannot be a custom function handle. To specify a custom kernel function, use a character vector or string scalar.For the value of the
'Support'
namevalue pair argument, the compiletime data type must match the runtime data type.
For more information on code generation, see Introduction to Code Generation and General Code Generation Workflow.
GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.
This function fully supports GPU arrays. For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).
Version History
Introduced in R2016a
Open Example
You have a modified version of this example. Do you want to open this example with your edits?
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
 América Latina (Español)
 Canada (English)
 United States (English)
Europe
 Belgium (English)
 Denmark (English)
 Deutschland (Deutsch)
 España (Español)
 Finland (English)
 France (Français)
 Ireland (English)
 Italia (Italiano)
 Luxembourg (English)
 Netherlands (English)
 Norway (English)
 Österreich (Deutsch)
 Portugal (English)
 Sweden (English)
 Switzerland
 United Kingdom (English)