Train naive Bayes classifier
fitNaiveBayes
will be removed in a future
release. Use fitcnb
instead.
returns
a naive Bayes classifier with additional options specified by one
or more NBModel
= fitNaiveBayes(X
,Y
,Name,Value
)Name,Value
pair arguments.
For example, you can specify a distribution to model the data, prior probabilities for the classes, or the kernel smoothing window bandwidth.
Load Fisher's iris data set.
load fisheriris
X = meas(:,3:4);
Y = species;
tabulate(Y)
Value Count Percent setosa 50 33.33% versicolor 50 33.33% virginica 50 33.33%
The software can classify data with more than two classes using naive Bayes methods.
Train a naive Bayes classifier.
NBModel = fitNaiveBayes(X,Y)
NBModel = Naive Bayes classifier with 3 classes for 2 dimensions. Feature Distribution(s):normal Classes:setosa, versicolor, virginica
NBModel
is a trained NaiveBayes
classifier.
By default, the software models the predictor distribution within each class using a Gaussian distribution having some mean and standard deviation. Use dot notation to display the parameters of a particular Gaussian fit, e.g., display the fit for the first feature within setosa
.
setosaIndex = strcmp(NBModel.ClassLevels,'setosa');
estimates = NBModel.Params{setosaIndex,1}
estimates = 1.4620 0.1737
The mean is 1.4620
and the standard deviation is 0.1737
.
Plot the Gaussian contours.
figure gscatter(X(:,1),X(:,2),Y); h = gca; xylim = [h.XLim h.YLim]; hold on Params = cell2mat(NBModel.Params); Mu = Params(2*(1:3)1,1:2); % Extracts the means Sigma = zeros(2,2,3); for j = 1:3 Sigma(:,:,j) = diag(Params(2*j,:)); % Extracts the standard deviations ezcontour(@(x1,x2)mvnpdf([x1,x2],Mu(j,:),Sigma(:,:,j)),... xylim+0.5*[1,1,1,1]) ... % Draws contours for the multivariate normal distributions end title('Naive Bayes Classifier  Fisher''s Iris Data') xlabel('Petal Length (cm)') ylabel('Petal Width (cm)') hold off
You can change the default distribution using the namevalue pair argument 'Distribution'
. For example, If some predictors are count based, then you can specify that they are multinomial random variables using 'Distribution','mn'
.
Load Fisher's iris data set.
load fisheriris
X = meas;
Y = species;
Train a naive Bayes classifier using every predictor.
NBModel1 = fitNaiveBayes(X,Y);
NBModel1.ClassLevels % Display the class order
NBModel1.Params
NBModel1.Params{1,2}
ans = 'setosa' 'versicolor' 'virginica' ans = [2x1 double] [2x1 double] [2x1 double] [2x1 double] [2x1 double] [2x1 double] [2x1 double] [2x1 double] [2x1 double] [2x1 double] [2x1 double] [2x1 double] ans = 3.4280 0.3791
By default, the software models the predictor distribution within each class as a Gaussian with some mean and standard deviation. There are four predictors and three class levels. Each cell in NBModel1.Params
corresponds to a numeric vector containing the mean and standard deviation of each distribution, e.g., the mean and standard deviation for setosa iris sepal widths are 3.4280
and 0.3791
, respectively.
Estimate the confusion matrix for NBModel1
.
predictLabels1 = predict(NBModel1,X); [ConfusionMat1,labels] = confusionmat(Y,predictLabels1)
ConfusionMat1 = 50 0 0 0 47 3 0 3 47 labels = 'setosa' 'versicolor' 'virginica'
Element (j, k) of ConfusionMat1
represents the number of observations that the software classifies as k, but the data show as being in class j.
Retrain the classifier using the Gaussian distribution for predictors 1 and 2 (the sepal lengths and widths), and the default normal kernel density for predictors 3 and 4 (the petal lengths and widths).
NBModel2 = fitNaiveBayes(X,Y,... 'Distribution',{'normal','kernel','normal','kernel'}); NBModel2.Params{1,2}
ans = KernelDistribution Kernel = normal Bandwidth = 0.179536 Support = unbounded
The software does not train parameters to the kernel density. Rather, the software chooses an optimal width. However, you can specify a width using the 'KSWidth'
namevalue pair argument.
Estimate the confusion matrix for NBModel2
.
predictLabels2 = predict(NBModel2,X); ConfusionMat2 = confusionmat(Y,predictLabels2)
ConfusionMat2 = 50 0 0 0 47 3 0 3 47
Based on the confusion matrices, the two classifiers perform similarly in the training sample.
Some spam filters classify an incoming email as spam based on how many times a word or puncutation (called tokens) occurs in an email. The predictors are the frequencies of particular words or punctuations in an email. Therefore, the predictors compose multinomial random variables.
This example illustrates classification using naive Bayes and mutlinomial predictors.
Suppose you observed 1000 emails and classified them as spam or not spam. Do this by randomly assigning 1 or 1 to y
for each email.
n = 1000; % Sample size rng(1); % For reproducibility y = randsample([1 1],n,true); % Random labels
To build the predictor data, suppose that there are five tokens in the vocabulary, and 20 observed tokens per email. Generate predictor data from the five tokens by drawing multinomial deviates. The relative frequencies for tokens corresponding to spam emails should differ from emails that are not spam.
tokenProbs = [0.2 0.3 0.1 0.15 0.25;... 0.4 0.1 0.3 0.05 0.15]; % Token relative frequencies tokensPerEmail = 20; X = zeros(n,5); X(y == 1,:) = mnrnd(tokensPerEmail,tokenProbs(1,:),sum(y == 1)); X(y == 1,:) = mnrnd(tokensPerEmail,tokenProbs(2,:),sum(y == 1));
Train a naive Bayes classifier. Specify that the predictors are multinomial.
NBModel = fitNaiveBayes(X,y,'Distribution','mn');
NBModel
is a trained NaiveBayes
classifier.
Assess the insample performance of NBModel
by estimating the misclassification rate.
predSpam = predict(NBModel,X); misclass = sum(y'~=predSpam)/n
misclass = 0.0200
The insample misclassification rate is 2%.
Randomly generate deviates that represent a new batch of emails.
nOut = 500; yOut = randsample([1 1],nOut,true); XOut = zeros(nOut,5); XOut(yOut == 1,:) = mnrnd(tokensPerEmail,tokenProbs(1,:),... sum(yOut == 1)); XOut(yOut == 1,:) = mnrnd(tokensPerEmail,tokenProbs(2,:),... sum(yOut == 1));
Classify the new emails using the trained naive Bayes classifier NBModel
, and determine whether the algorithm generalizes.
predSpamOut = predict(NBModel,XOut); genRate = sum(yOut'~=predSpamOut)/nOut
genRate = 0.0260
The outofsample misclassification rate is 2.6% indicating that the classifier generalizes fairly well.
X
— Predictor datamatrix of numeric valuesPredictor data to which the naive Bayes classifier is trained, specified as a matrix of numeric values.
Each row of X
corresponds to one observation
(also known as an instance or example), and each column corresponds
to one variable (also known as a feature).
The length of Y
and the number of rows of X
must
be equivalent.
Data Types: double
Y
— Class labelscategorical array  character array  logical vector  vector of numeric values  cell array of stringsClass labels to which the naive Bayes classifier is trained,
specified as a categorical or character array, logical or numeric
vector, or cell array of strings. Each element of Y
defines
the class membership of the corresponding row of X
. Y
supports K class
levels.
If Y
is a character array, then each row
must correspond to one class label.
The length of Y
and the number of rows of X
must
be equivalent.
Data Types: cell
 char
 double
 logical
Note:
The software treats
Removing rows of 
Specify optional commaseparated pairs of Name,Value
arguments.
Name
is the argument
name and Value
is the corresponding
value. Name
must appear
inside single quotes (' '
).
You can specify several name and value pair
arguments in any order as Name1,Value1,...,NameN,ValueN
.
'Distribution','mn','Prior','uniform','KSWidth',0.5
specifies
the following: the data distribution is multinomial, the prior probabilities
for all classes are equal, and the kernel smoothing window bandwidth
for all classes is 0.5
units.'Distribution'
— Data distributions'normal'
(default)  'kernel'
 'mn'
 'mvmn'
 cell array of stringsData distributions fitNaiveBayes
uses to
model the data, specified as the commaseparated pair consisting of 'Distribution'
and
a string or cell array of strings.
This table summarizes the available distributions.
Value  Description 

'kernel'  Kernel smoothing density estimate. 
'mn'  Multinomial distribution. If you specify mn ,
then all features are components of a multinomial distribution. Therefore,
you cannot include 'mn' as an element of a cell
array of strings. For details, see Algorithms. 
'mvmn'  Multivariate multinomial distribution. For details, see Algorithms. 
'normal'  Normal (Gaussian) distribution. 
If you specify a string, then the software models all the features using that distribution. If you specify a 1byD cell array of strings, then the software models feature j using the distribution in element j of the cell array.
Example: 'Distribution',{'kernel','normal'}
Data Types: cell
 char
'KSSupport'
— Kernel smoothing density support'unbounded'
(default)  'positive'
 cell array  numeric row vectorKernel smoothing density support, specified as the commaseparated
pair consisting of 'KSSupport'
and a numeric row
vector, a string, or a cell array. The software applies the kernel
smoothing density to this region.
If you do not specify 'Distribution','kernel'
,
then the software ignores the values of 'KSSupport'
, 'KSType'
,
and 'KSWidth'
.
This table summarizes the available options for setting the kernel smoothing density region.
Value  Description 

1by2 numeric row vector  For example, [L,U] , where L and U are
the finite lower and upper bounds, respectively, for the density support. 
'positive'  The density support is all positive real values. 
'unbounded'  The density support is all real values. 
If you specify a 1byD cell
array, with each cell containing any value in the table, then the
software trains the classifier using the kernel support in cell j for
feature j in X
.
Example: 'KSSupport',{[10,20],'unbounded'}
Data Types: cell
 char
 double
'KSType'
— Kernel smoother type'normal'
(default)  'box'
 'epanechnikov'
 'triangle'
 cell array of stringsKernel smoother type, specified as the commaseparated pair
consisting of 'KSType'
and a string or cell array
of strings.
If you do not specify 'Distribution','kernel'
,
then the software ignores the values of 'KSSupport'
, 'KSType'
,
and 'KSWidth'
.
This table summarizes the available options for setting the kernel smoothing density region. Let I{u} denote the indictor function.
Value  Kernel  Formula 

'box'  Box (uniform)  $$f(x)=0.5I\left\{\leftx\right\le 1\right\}$$ 
'epanechnikov'  Epanechnikov  $$f(x)=0.75\left(1{x}^{2}\right)I\left\{\leftx\right\le 1\right\}$$ 
'normal'  Gaussian  $$f(x)=\frac{1}{\sqrt{2\pi}}\mathrm{exp}\left(0.5{x}^{2}\right)$$ 
'triangle'  Triangular  $$f(x)=\left(1\leftx\right\right)I\left\{\leftx\right\le 1\right\}$$ 
If you specify a 1byD cell array, with
each cell containing any value in the table, then the software trains
the classifier using the kernel smoother type in cell j for
feature j in X
.
Example: 'KSType',{'epanechnikov','normal'}
Data Types: cell
 char
'KSWidth'
— Kernel smoothing window bandwidthmatrix of numeric values (default)  numeric column vector  numeric row vector  scalar  structure arrayKernel smoothing window bandwidth, specified as the commaseparated
pair consisting of 'KSWidth'
and a matrix of numeric
values, numeric row vector, numeric column vector, scalar, or structure
array.
If you do not specify 'Distribution','kernel'
,
then the software ignores the values of 'KSSupport'
, 'KSType'
,
and 'KSWidth'
.
Suppose there are K class levels and D predictors. This table summarizes the available options for setting the kernel smoothing window bandwidth.
Value  Description 

KbyD matrix of numeric values  Element (k,d) specifies the bandwidth for predictor d in class k. 
Kby1 numeric column vector  Element k specifies the bandwidth for all predictors in class k. 
1byD numeric row vector  Element d specifies the bandwidth in all class levels for predictor d. 
scalar  Specifies the bandwidth for all features in all classes. 
structure array  A structure array S containing class levels
and their bandwidths. S must have two fields:

By default, the software selects a default bandwidth automatically for each combination of feature and class by using a value that is optimal for a Gaussian distribution.
Example: 'KSWidth',struct('width',[0.5,0.25],'group',{{'b';'g'}})
Data Types: double
 struct
'Prior'
— Class prior probabilities'empirical'
(default)  'uniform'
 numeric vector  structure arrayClass prior probabilities, specified as the commaseparated
pair consisting of 'Prior'
and a numeric vector,
structure array, or string.
This table summarizes the available options for setting prior probabilities.
Value  Description 

'empirical'  The software uses the class relative frequencies distribution for the prior probabilities. 
numeric vector  A numeric vector of length K specifying
the prior probabilities for each class. The order of the elements
of The software normalizes
prior probabilities to sum to 
structure array  A structure array S containing class levels
and their prior probabilities. S must have two
fields:

'uniform'  The prior probabilities are equal for all classes. 
Example: 'Prior',struct('prob',[1,2],'group',{{'b';'g'}})
Data Types: char
 double
 struct
NBModel
— Trained naive Bayes classifierNaiveBayes
classifierTrained naive Bayes classifier, returned as a NaiveBayes
classifier.
In the bagoftokens model, the value of predictor j is the nonnegative number of occurrences of token j in this observation. The number of categories (bins) in this multinomial model is the number of distinct tokens, that is, the number of predictors.
For classifying countbased data, such as the bagoftokens model,
use the multinomial distribution (e.g., set 'Distribution','mn'
).
This list defines the order of the classes. It is
useful when you specify prior probabilities by setting 'Prior',prior
,
where prior
is a numeric vector.
If Y
is a categorical array, then
the order of the class levels matches the output of categories(Y)
.
If Y
is a numeric or logical vector,
then the order of the class levels matches the output of sort(unique(Y))
.
For cell arrays of string and character arrays, the
order of the class labels is the order which each label appears in Y
.
If you specify 'Distribution','mn'
,
then the software considers each observation as multiple trials of
a multinomial distribution, and considers each occurrence of a token
as one trial (see BagofTokens Model).
If you specify 'Distribution','mvmn'
,
then the software assumes each individual predicator follows a multinomial
model within a class. The parameters for a predictor include the probabilities
of all possible values that the corresponding feature can take.