# rankfeatures

Rank key features by class separability criteria

## Syntax

`[`

* IDX*,

*] = rankfeatures(*

`Z`

*,*

`X`

*)*

`Group`

[

*,*

`IDX`

*] = rankfeatures(*

`Z`

*,*

`X`

*, ...'Criterion',*

`Group`

*, ...)*

`CriterionValue`

[

*,*

`IDX`

*] = rankfeatures(*

`Z`

*,*

`X`

*, ...'CCWeighting',*

`Group`

*, ...)*

`ALPHA`

[

*,*

`IDX`

*] = rankfeatures(*

`Z`

*,*

`X`

*, ...'NWeighting',*

`Group`

*, ...)*

`BETA`

[

*,*

`IDX`

*] = rankfeatures(*

`Z`

*,*

`X`

*, ...'NumberOfIndices',*

`Group`

*, ...)*

`N`

[

*,*

`IDX`

*] = rankfeatures(*

`Z`

*,*

`X`

*, ...'CrossNorm',*

`Group`

*, ...)*

`CN`

## Description

`[`

ranks
the features in * IDX*,

*] = rankfeatures(*

`Z`

*,*

`X`

*)*

`Group`

*using an independent evaluation criterion for binary classification.*

`X`

*is a matrix where every column is an observed vector and the number of rows corresponds to the original number of features.*

`X`

*contains the class labels.*

`Group`

* IDX* is the list of indices to the
rows in

*with the most significant features.*

`X`

*is the absolute value of the criterion used (see below).*

`Z`

* Group* can be a numeric vector, a cell array of character
vectors or string vector.

`numel(Group)`

is the same as the number
of columns in *, and*

`X`

*must have only two unique values. If it contains any NaN values, the function ignores the corresponding observation vector in*

`Group`

*.*

`X`

`[`

calls * IDX*,

*] = rankfeatures(*

`Z`

*,*

`X`

*, ...'*

`Group`

*',*

`PropertyName`

*, ...)*

`PropertyValue`

`rankfeatures`

with optional
properties that use property name/property value pairs. You can specify
one or more properties in any order. Each *must be enclosed in single quotation marks and is case insensitive. These property name/property value pairs are as follows:*

`PropertyName`

```
[
```

sets
the criterion used to assess the significance of every feature for
separating two labeled groups. Choices are:* IDX*,

*] = rankfeatures(*

`Z`

*,*

`X`

*, ...'Criterion',*

`Group`

*, ...)*

`CriterionValue`

`'ttest'`

(default) — Absolute value two-sample t-test with pooled variance estimate.`'entropy'`

— Relative entropy, also known as Kullback-Leibler distance or divergence.`'bhattacharyya'`

— Minimum attainable classification error or Chernoff bound.`'roc'`

— Area between the empirical receiver operating characteristic (ROC) curve and the random classifier slope.`'wilcoxon'`

— Absolute value of the standardized u-statistic of a two-sample unpaired Wilcoxon test, also known as Mann-Whitney.

**Note**

`'ttest'`

, `'entropy'`

, and `'bhattacharyya'`

assume
normal distributed classes while `'roc'`

and `'wilcoxon'`

are
nonparametric tests. All tests are feature independent.

`[`

uses
correlation information to outweigh the * IDX*,

*] = rankfeatures(*

`Z`

*,*

`X`

*, ...'CCWeighting',*

`Group`

*, ...)*

`ALPHA`

*value of potential features using*

`Z`

`Z`

*
(1-`ALPHA`

*(RHO))

, where `RHO`

is
the average of the absolute values of the cross-correlation coefficient
between the candidate feature and all previously selected features. *sets the weighting factor. It is a scalar value between*

`ALPHA`

`0`

and `1`

.
When *is*

`ALPHA`

`0`

(default)
potential features are not weighted. A large value of `RHO`

(close
to `1`

) outweighs the significance statistic; this
means that features that are highly correlated with the features already
picked are less likely to be included in the output list.`[`

uses
regional information to outweigh the * IDX*,

*] = rankfeatures(*

`Z`

*,*

`X`

*, ...'NWeighting',*

`Group`

*, ...)*

`BETA`

*value of potential features using*

`Z`

`Z`

*
(1-exp(-(DIST/`BETA`

).^2))

, where `DIST`

is
the distance (in rows) between the candidate feature and previously
selected features. *sets the weighting factor. It is greater than or equal to*

`BETA`

`0`

. When `BETA`

is `0`

(default)
potential features are not weighted. A small `DIST`

(close
to `0`

) outweighs the significance statistics of
only close features. This means that features that are close to already
picked features are less likely to be included in the output list.
This option is useful for extracting features from time series with
temporal correlation.* BETA* can also be a function of the
feature location, specified using

`@`

or an anonymous
function. In both cases `rankfeatures`

passes the
row position of the feature to `BETA()`

and expects
back a value greater than or equal to `0`

.**Note**

You can use `'CCWeighting'`

and `'NWeighting'`

together.

`[`

sets
the number of output indices in * IDX*,

*] = rankfeatures(*

`Z`

*,*

`X`

*, ...'NumberOfIndices',*

`Group`

*, ...)*

`N`

*. Default is the same as the number of features when*

`IDX`

*and*

`ALPHA`

*are*

`BETA`

`0`

,
or `20`

otherwise.`[`

applies
independent normalization across the observations for every feature.
Cross-normalization ensures comparability among different features,
although it is not always necessary because the selected criterion
might already account for this. Choices are:* IDX*,

*] = rankfeatures(*

`Z`

*,*

`X`

*, ...'CrossNorm',*

`Group`

*, ...)*

`CN`

`'none'`

(default) — Intensities are not cross-normalized.`'meanvar'`

—`x_new = (x - mean(x))/std(x)`

`'softmax'`

—`x_new = (1+exp((mean(x)-x)/std(x)))^-1`

`'minmax'`

—`x_new = (x - min(x))/(max(x)-min(x))`

## Examples

## References

[1] Theodoridis, S., and Koutroumbas, K. (1999). Pattern Recognition, Academic Press, 341-342.

[2] Liu, H., Motoda, H. (1998). Feature Selection for Knowledge Discovery and Data Mining, Kluwer Academic Publishers.

[3] Ross, D.T. et.al. (2000). Systematic Variation in Gene Expression Patterns in Human Cancer Cell Lines. Nature Genetics. 24 (3), 227-235.

## See Also

`classperf`

| `crossvalind`

| `randfeatures`

| `classify`

| `sequentialfs`

**Introduced before R2006a**