## Documentation |

Validate clusters in phylogenetic tree

* LeafClusters* = cluster(

[

[

cluster(..., 'Criterion',

cluster(..., 'MaxClust',

cluster(..., 'Distances',

Tree | Phylogenetic tree object created, such as created with
the | ||

Threshold | Scalar specifying a threshold value. | ||

CriterionValue | String specifying the criterion to determine the number of clusters as a function of the species pairwise distances. Choices are: `'maximum'`(default) — Maximum within cluster pairwise distance (*W*). Cluster splitting stops when_{max}*W*≤_{max}.`Threshold``'median'`— Median within cluster pairwise distance (*W*). Cluster splitting stops when_{med}*W*≤_{med}.`Threshold``'average'`— Average within cluster pairwise distance (*W*). Cluster splitting stops when_{avg}*W*≤_{avg}.`Threshold``'ratio'`— Between/within cluster pairwise distance ratio, defined as*BW*= (trace(_{rat}*B*)/(*k*-)) / (trace(`1`*W*)/(*n*-*k*))where *B*and*W*are the between- and within-scatter matrices, respectively.*k*is the number of clusters, and*n*is the number of species in the tree. Cluster splitting stops when*BW*≥_{rat}.`Threshold``'gain'`— Within cluster pairwise distance gain, defined as*W*= (trace(_{gain}*W*)/ (trace(_{old}*W*) - 1) * (*n*-*k*- 1))where *W*and*W*are the within-scatter matrices for_{old}*k*and*k*- 1, respectively.*k*is the number of clusters, and*n*is the number of species in the tree. Cluster splitting stops when*W*≤_{gain}.`Threshold``'silhouette'`— Average silhouette width (*SW*)._{avg}*SW*ranges from_{avg}`-1`to`+1`. Cluster splitting stops when*SW*≥_{avg}. For more information, see`Threshold``silhouette`.
| ||

MaxClustValue | Positive integer specifying the maximum number of possible clusters for the tested partitions. Default is the number of leaves in the tree.
| ||

DistancesValue | Matrix of pairwise distances, such as returned by the |

LeafClusters | Column vector containing a cluster index for each species
(leaf) in |

NodeClusters | Column vector containing the cluster index for each leaf
node and branch node in |

Branches | Two-column matrix containing, for each step in the algorithm, the index of the branch being considered and the value of the criterion. Each row corresponds to a step in the algorithm. The first column contains branch indices, and the second column contains criterion values. |

`LeafClusters = cluster(Tree, Threshold)` returns
a column vector containing a cluster index for each species (leaf)
in a phylogenetic tree object. It determines the optimal number of
clusters as follows:

Starting with two clusters (

*k*=`2`), selects the partition that optimizes the criterion specified by the`'Criterion'`propertyIncrements

*k*by`1`and again selects the optimal partitionContinues incrementing

*k*and selecting the optimal partition until a criterion value =or`Threshold`*k*= the maximum number of clusters (that is, number of leaves)From all possible

*k*values, selects the*k*value whose partition optimizes the criterion

`[LeafClusters, NodeClusters]
= cluster(Tree, Threshold)` returns
a column vector containing the cluster index for each leaf node and
branch node in

`[LeafClusters, NodeClusters, Branches]
= cluster(Tree, Threshold)` returns
a two-column matrix containing, for each step in the algorithm, the
index of the branch being considered and the value of the criterion.
Each row corresponds to a step in the algorithm. The first column
contains branch indices, and the second column contains criterion
values.

`cluster(..., 'PropertyName', PropertyValue,
...)` calls

`cluster(..., 'Criterion', CriterionValue,
...)` specifies the criterion to determine the number of
clusters as a function of the species pairwise distances.

`cluster(..., 'MaxClust', MaxClustValue,
...)` specifies the maximum number of possible clusters
for the tested partitions. Default is the number of leaves in the
tree.

`cluster(..., 'Distances', DistancesValue,
...)` substitutes the patristic distances in

Validate the clusters in a phylogenetic tree:

% Read sequences from a multiple alignment file into a MATLAB % structure gagaa = multialignread('aagag.aln'); % Build a phylogenetic tree from the sequences gag_tree = seqneighjoin(seqpdist(gagaa),'equivar',gagaa); % Validate the clusters in the tree and find the best partition % using the 'gain' criterion [i,j] = cluster(gag_tree,[],'criterion','gain','maxclust',10); % Use the returned vector of indices to color the branches of each % cluster in a plot of the tree h = plot(gag_tree); set(h.BranchLines(j==2),'Color','b') set(h.BranchLines(j==1),'Color','r')

[1] Dudoit, S. and Fridlyan, J. (2002). A
prediction-based resampling method for estimating the number of clusters
in a dataset. Genome Biology *3(7)*, research
0036.1–0036.21.

[2] Theodoridis, S. and Koutroumbas, K. (1999). Pattern Recognition (Academic Press), pp. 434–435.

[3] Kaufman, L. and Rousseeuw, P.J. (1990). Finding Groups in Data: An Introduction to Cluster Analysis (New York, Wiley).

[4] Calinski, R. and Harabasz, J. (1974).
A dendrite method for cluster analysis. Commun Statistics *3*,
1–27.

[5] Hartigan, J.A. (1985). Statistical theory
in clustering. J Classification *2*, 63–76.

`cluster` | `phytree` | `phytreeread` | `phytreeviewer` | `plot` | `seqlinkage` | `seqneighjoin` | `seqpdist` | `silhouette` | `view`

Was this topic helpful?