Performance Curves by perfcurve
The perfcurve
function computes a receiver operating characteristic (ROC) curve and
other performance curves. You can use this function to evaluate classifier performance on test
data after you train a classifier.
Alternatively, you can compute performance metrics for a ROC curve and other performance
curves by creating a rocmetrics
object.
rocmetrics
supports both binary and multiclass classification problems, and
provides object functions to plot a ROC curve (plot
), compute an
average ROC curve for multiclass problems (average
), and compute
additional metrics after creating an object (addMetrics
). For more
details, see ROC Curve and Performance Metrics.
Input Scores and Labels for perfcurve
You can use perfcurve
with any classifier or, more broadly, with any
function that returns a numeric score for an instance of input data. By convention adopted here,
A high score returned by a classifier for any given instance signifies that the instance is likely from the positive class.
A low score signifies that the instance is likely from the negative classes.
For some classifiers, you can interpret the score as the posterior probability of
observing an instance of the positive class at point X
. An example of such a
score is the fraction of positive observations in a leaf of a decision tree. In this case,
scores fall into the range from 0 to 1 and scores from positive and negative classes add up to
unity. Other methods can return scores ranging between minus and plus infinity, without any
obvious mapping from the score to the posterior class probability.
perfcurve
does not impose any requirements on the input score range.
Because of this lack of normalization, you can use perfcurve
to process
scores returned by any classification, regression, or fit method. perfcurve
does not make any assumptions about the nature of input scores or relationships between the
scores for different classes. As an example, consider a problem with three classes,
A
, B
, and C
, and assume that the
scores returned by some classifier for two instances are as follows:
A | B | C | |
---|---|---|---|
Instance 1 | 0.4 | 0.5 | 0.1 |
Instance 2 | 0.4 | 0.1 | 0.5 |
If you want to compute a performance curve for separation of classes A
and B
, with C
ignored, you need to address the ambiguity
in selecting A
over B
. You could opt to use the score
ratio, s(A)/s(B)
, or score difference, s(A)-s(B)
; this
choice could depend on the nature of these scores and their normalization.
perfcurve
always takes one score per instance. If you only supply scores for
class A
, perfcurve
does not distinguish between
observations 1 and 2. The performance curve in this case may not be optimal.
perfcurve
is intended for use with classifiers that return scores, not those that
return only predicted classes. As a counter-example, consider a decision tree that returns only
hard classification labels, 0 or 1, for data with two classes. In this case, the performance
curve reduces to a single point because classified instances can be split into positive and
negative categories in one way only.
For input, perfcurve
takes true class labels for some data and scores
assigned by a classifier to these data. By default, this utility computes a Receiver Operating
Characteristic (ROC) curve and returns values of 1–specificity, or false positive rate, for
X
and sensitivity, or true positive rate, for Y
. You can
choose other criteria for X
and Y
by selecting one out of
several provided criteria or specifying an arbitrary criterion through an anonymous function.
You can display the computed performance curve using plot(X,Y)
.
Computation of Performance Metrics
perfcurve
can compute values for various criteria to plot either on the
x- or the y-axis. All such criteria are described by
a 2-by-2 confusion matrix, a 2-by-2 cost matrix, and a 2-by-1 vector of scales applied to class
counts.
Confusion Matrix
The confusionchart
matrix, C
, is defined as
where
P stands for "positive".
N stands for "negative".
T stands for "true".
F stands for "false".
For example, the first row of the confusion matrix defines how the classifier
identifies instances of the positive class: C(1,1)
is the count of correctly
identified positive instances and C(1,2)
is the count of positive instances
misidentified as negative.
Misclassification Cost Matrix
The cost matrix defines the cost of misclassification for each category:
where Cost(I|J)
is the cost of assigning an instance of
class J
to class I
. Usually
Cost(I|J)=0
for I=J
. For flexibility,
perfcurve
allows you to specify nonzero costs for correct classification as
well.
Scale Vector
The two scales include prior information about class probabilities.
perfcurve
computes these scales by taking
scale(P)=prior(P)*N
and scale(N)=prior(N)*P
and
normalizing the sum scale(P)+scale(N)
to 1. P=TP+FN
and
N=TN+FP
are the total instance counts in the positive and negative class,
respectively. The function then applies the scales as multiplicative factors to the counts from
the corresponding class: perfcurve
multiplies counts from the positive class
by scale(P)
and counts from the negative class by
scale(N)
. Consider, for example, computation of positive predictive value,
PPV = TP/(TP+FP)
. TP
counts come from the positive class
and FP
counts come from the negative class. Therefore, you need to scale
TP
by scale(P)
and FP
by
scale(N)
, and the modified formula for PPV
with prior
probabilities taken into account is now:
If all scores in the data are above a certain threshold,
perfcurve
classifies all instances as 'positive'
. This
means that TP
is the total number of instances in the positive class and
FP
is the total number of instances in the negative class. In this case,
PPV
is simply given by the prior:
The perfcurve
function returns two vectors,
X
and Y
, of performance measures. Each measure is some
function of confusion
, cost
, and scale
values. You can request specific measures by name or provide a function handle to compute a
custom measure. The function you provide should take confusion
,
cost
, and scale
as its three inputs and return a vector
of output values.
Thresholds
The criterion for X
must be a monotone function of the positive
classification count, or equivalently, threshold for the supplied scores. If
perfcurve
cannot perform a one-to-one mapping between values of the
X
criterion and score thresholds, it exits with an error message.
By default, perfcurve
computes values of the X
and
Y
criteria for all possible score thresholds. Alternatively, it can compute
a reduced number of specific X
values supplied as an input argument. In
either case, for M
requested values, perfcurve
computes
M+1
values for X
and Y
. The first
value out of these M+1
values is special. perfcurve
computes it by setting the TP
instance count to zero and setting
TN
to the total count in the negative class. This value corresponds to the
'reject all'
threshold. On a standard ROC curve, this translates into an
extra point placed at (0,0)
.
NaN Score Values
If there are NaN
values among input scores,
perfcurve
can process them in either of two ways:
It can discard rows with
NaN
scores.It can add them to false classification counts in the respective class.
That is, for any threshold, instances with NaN
scores from
the positive class are counted as false negative (FN
), and instances with
NaN
scores from the negative class are counted as false positive
(FP
). In this case, the first value of X
or
Y
is computed by setting TP
to zero and setting
TN
to the total count minus the NaN
count in the
negative class. For illustration, consider an example with two rows in the positive and two
rows in the negative class, each pair having a NaN
score:
Class | Score |
---|---|
Negative | 0.2 |
Negative | NaN |
Positive | 0.7 |
Positive | NaN |
If you discard rows with NaN
scores, then as the score
cutoff varies, perfcurve
computes performance measures as in the following
table. For example, a cutoff of 0.5 corresponds to the middle row where rows 1 and 3 are
classified correctly, and rows 2 and 4 are omitted.
TP | FN | FP | TN |
---|---|---|---|
0 | 1 | 0 | 1 |
1 | 0 | 0 | 1 |
1 | 0 | 1 | 0 |
If you add rows with NaN
scores to the false category in
their respective classes, perfcurve
computes performance measures as in the
following table. For example, a cutoff of 0.5 corresponds to the middle row where now rows 2
and 4 are counted as incorrectly classified. Notice that only the FN
and
FP
columns differ between these two tables.
TP | FN | FP | TN |
---|---|---|---|
0 | 2 | 1 | 1 |
1 | 1 | 1 | 1 |
1 | 1 | 2 | 0 |
Multiclass Classification Problems
For data with three or more classes, perfcurve
takes one positive class
and a list of negative classes for input. The function computes the X
and
Y
values using counts in the positive class to estimate
TP
and FN
, and using counts in all negative classes to
estimate TN
and FP
. perfcurve
can
optionally compute Y
values for each negative class separately and, in
addition to Y
, return a matrix of size
M
-by-C
, where M
is the number of
elements in X
or Y
and C
is the number
of negative classes. You can use this functionality to monitor components of the negative class
contribution. For example, you can plot TP
counts on the
X
-axis and FP
counts on the Y
-axis. In
this case, the returned matrix shows how the FP
component is split across
negative classes.
Confidence Intervals
You can also use perfcurve
to estimate confidence intervals.
perfcurve
computes confidence bounds using either cross-validation or
bootstrap. If you supply cell arrays for labels
and
scores
, perfcurve
uses cross-validation and treats
elements in the cell arrays as cross-validation folds. If you set input parameter
NBoot
to a positive integer, perfcurve
generates
nboot
bootstrap replicas to compute pointwise confidence bounds.
perfcurve
estimates the confidence bounds using one of two methods:
Vertical averaging (VA) — estimate confidence bounds on
Y
andT
at fixed values ofX
. Use theXVals
input parameter to use this method for computing confidence bounds.Threshold averaging (TA) — estimate confidence bounds for
X
andY
at fixed thresholds for the positive class score. Use theTVals
input parameter to use this method for computing confidence bounds.
Observation Weights
To use observation weights instead of observation counts, you can use the
'Weights'
parameter in your call to perfcurve
. When you
use this parameter, to compute X
, Y
and
T
or to compute confidence bounds by cross-validation,
perfcurve
uses your supplied observation weights instead of observation
counts. To compute confidence bounds by bootstrap, perfcurve
samples
N out of N with replacement using your weights as
multinomial sampling probabilities.
References
See Also
rocmetrics
| addMetrics
| average
| plot
| perfcurve
| confusionchart