Outputting the number of clusters found by linkage or dendrogram functions

9 views (last 30 days)
% for example
X = rand(20000,3);
Z = linkage(X,'ward');
dendrogram(Z);
c = cluster(Z,'Maxclust',{input number of clusters here});
I am trying to find the number of clusters found by the linkage/dendrogram function so I can input the number into the cluster function in order to get cluster vectors which I can then crosstab.

Answers (1)

Steven Lord
Steven Lord on 17 Aug 2023
Do either the second or third output argument from the dendrogram function give you the information you're looking for?
X = rand(20000,3);
Z = linkage(X,'ward');
[tree, T, outperm] = dendrogram(Z);
whos T outperm
Name Size Bytes Class Attributes T 20000x1 160000 double outperm 1x30 240 double
T lists which data points are in each leaf and outperm is the vector of leaf node labels. From the documentation page, "If there are P leaves in the dendrogram plot, outperm is a permutation of the vector 1:P."
numberOfLeafNodes = max(outperm)
numberOfLeafNodes = 30
Let's see how many points are contained in each leaf node.
[countsPerLeafNode, edges] = histcounts(T, BinMethod="integers")
countsPerLeafNode = 1×30
888 1091 685 807 484 668 722 598 904 736 559 660 891 566 731 621 772 712 437 339 512 859 1008 494 345 752 469 705 473 512
edges = 1×31
0.5000 1.5000 2.5000 3.5000 4.5000 5.5000 6.5000 7.5000 8.5000 9.5000 10.5000 11.5000 12.5000 13.5000 14.5000 15.5000 16.5000 17.5000 18.5000 19.5000 20.5000 21.5000 22.5000 23.5000 24.5000 25.5000 26.5000 27.5000 28.5000 29.5000
I'll create a table to summarize the results (and just show the first few rows.)
results = table(edges(1:end-1).'+0.5, countsPerLeafNode.', ... % +0.5 to get bin centers
'VariableNames', ["Node number", "Count"]);
head(results)
Node number Count ___________ _____ 1 888 2 1091 3 685 4 807 5 484 6 668 7 722 8 598
Or if you want a picture:
histogram(T, BinMethod="integers")
  1 Comment
Jackson Morgan
Jackson Morgan on 17 Aug 2023
Edited: Jackson Morgan on 17 Aug 2023
Apologies, I am not searching for the number of points under each leaf given P>30. Though this might actually help later on. Let me explain my question more clearly.
I need a way to output the number of clusters of points easily identified visually by examining a colored dendrogram.
X=rand(20,2);
D=pdist(X);
L=linkage(D,'ward');
figure();
dendrogram(L,"ColorThreshold","default");
For example, this dendrogram clearly has four clusters. Obviously, since the randomly generated matrix X changes every time I run the code, the number of clusters is also subject to change. My question is how do I output the number of clusters.
My initial thoight was to use the default ColorThreshhold value (0.7 * max(Z(:,3))) but I am not sure how to find the clusters of nodes whose ColorThreshold is less than the default value.

Sign in to comment.

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!