Dendrogram; How to understand linkage(x) output?

30 views (last 30 days)
I have three questions regarding Dendrogram and linkge function. I have data of some x models e.g.,
1) I run the following code to draw the Dendrogram. What the coulmns one and two correspond in output of the linkage(X)?
X = [0.1576; 0.9706; 0.9572; 0.4854; 0.8003; 0.1419; 0.4218; 0.9157; 0.7922; 0.9595];
tree = linkage(X,'average');
The tree variable is
tree =
3.0000 10.0000 0.0023
5.0000 9.0000 0.0081
2.0000 11.0000 0.0122
1.0000 6.0000 0.0157
8.0000 13.0000 0.0467
4.0000 7.0000 0.0636
12.0000 15.0000 0.1545
14.0000 16.0000 0.3039
17.0000 18.0000 0.5976
I struggling to understand what the numbers(14,12,16) in coulmns one and two correspond.
2) How the numbers in Dendrogram are assinged on the horizontal axis? I thought these would be the columns one and two however they are not.
3)I would like to change the horzontal axis numbers to the names of the models, to which these numbers correspond so that the horizontal numbers show model names of similar clustors instead of the numbers. the names could be x1, x2,x3 i.e.,
Kindly, help me to sort out this.

Accepted Answer

Pratyush Roy
Pratyush Roy on 18 May 2021
Edited: Pratyush Roy on 18 May 2021
Hi Imran,
1) Agglomerative hierarchical cluster tree, returned as a numeric matrix. Z is an (m-1)-by-3 matrix, where m is the number of observations in the original data. Columns 1 and 2 of Z contain cluster indices linked in pairs to form a binary tree. The leaf nodes are numbered from 1 to m. Leaf nodes are the singleton clusters from which all higher clusters are built. Each newly formed cluster, corresponding to row Z(I,:), is assigned the index m + I. The entries Z(I,1) and Z(I,2) contain the indices of the two component clusters that form cluster m+I. The m-1 higher clusters correspond to the interior nodes of the clustering tree. Z(I,3) contains the linkage distance between the two clusters merged in row Z(I,:).
For example, consider building a tree with 30 initial nodes. Suppose that cluster 5 and cluster 7 are combined at step 12, and that the distance between them at that step is 1.5. Then Z(12,:) is [5 7 1.5]. The newly formed cluster has index 12 + 30 = 42. If cluster 42 appears in a later row, then the function is combining the cluster created at step 12 into a larger cluster.
You can refer to the documentation link here for more information.
2) The horizontal axis numbers are the leaf node indices for the tree. If there are 30 or fewer data points in the original data set, then each leaf in the dendrogram corresponds to one data point. If there are more than 30 data points, then dendrogram collapses lower branches so that there are 30 leaf nodes. As a result, some leaves in the plot correspond to more than one data point. You can refer to the documentation link here for more information.
3)You can use the "Labels" Name-Value pair to change the horizontal labels in the dendrogram. The code snippet below is helpful to understand how to use string names for Labels:
labels = cellstr(num2str((1:10)', 'x%d')) % Generates cell array of character vectors {'x1'},{'x2'},{'x3'}
dendrogram(X, 'Labels', labels)
Hope this helps!

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!