MATLAB Answers

How can I add colors to a k-means gscatter plot based on a certain cluster?

9 views (last 30 days)
Kimberly Cardillo
Kimberly Cardillo on 8 Jul 2020
Edited: Adam Danz on 13 Jul 2020 at 11:17
I have a matrix called x that is nxp where n represents observations and p represents variables. I want to use the kmeans code to cluster my observations into three different clusters called T1, T2 and Noise (i have a categorical array called f which labels each observation into one of the clusters). I want to plot these observations based on my PCA code. I want the T1 observations to be green [0 0.8 0], the T2 observations to be red [1 0 0], and the Noise observations to be purple [0.4940 0.1840 0.5560]. My code is seen below.
[coeff,score,latent] = pca(x);
grp= kmeans(x,3,'Distance','sqeuclidean','Replicates',8);
gscatter(score(:,1),score(:,2),grp,'*',6)
I tried making the third line to be
gscatter(score(:,1),score(:,2),grp,[0.4940 0.1840 0.5560;1 0 0;0 0.8 0],'*',6)
and that gave me the colors I needed but not always in the right spot. Every time I run the code, grp doesn't always start with the same observation so the group number changes so I cant assign the color based on the group number. The figure below shows my gscatter plot but the T1 observations are where the red is and the T2 obervations are where the green is.

  0 Comments

Sign in to comment.

Answers (1)

Adam Danz
Adam Danz on 8 Jul 2020
Edited: Adam Danz on 8 Jul 2020
The second gscatter syntax in your question looks correct,
gscatter(score(:,1),score(:,2),grp,[0.4940 0.1840 0.5560;1 0 0;0 0.8 0],'*',6)
It's not clear what you mean by "[it gives] me the colors I needed but not always in the right spot". The (x,y) scatter coordinates should always be the same.
The first gscatter syntax in your question is incorrect. The 4th input should indicate color, not symbol.
"Every time I run the code, grp doesn't always start with the same observation so the group number changes so I cant assign the color based on the group number"
Perhaps you mean that the color of the groups differ between iterations. If so, is that really a problem? If you're construction the legend correctly, the colors should always pair correctly with the legend strings.
If you'd like certain colors to be associated with certain groups (ie, "noise" is always purple) you'll need some way or identifying which group represents "noise". The kmeans algorithm doesn't have any idea what your data mean. It only partitions the data into groups. If you know that one group should always have a center that is below and to the left of other groups, you could use the 2nd output of kmeans() which idicates the group center points. Then associate the group values to each center point.
Alternatively, you might be able to pre-define the approximate center points of the clusters if you know them ahead of time using the start property of kmeans.

  2 Comments

dpb
dpb on 8 Jul 2020
Looks like to do the coloring the way he wants by the ID instead of by the KMEANS grp index will need to matchup his grp variable with the categical array f to identify which cluster is which category. But, depending on the data, the classification may not be perfect...
I'd have to scratch my head over whether using f as the grouping variable does the right thing or not -- it may be that simple but I didn't feel like trying to make up some test data to try to duplicate something similar.
Adam Danz
Adam Danz on 9 Jul 2020
I hadn't thought of that interpretation.
Kimberly, you could try something like this.
id = findgroups(f);
gscatter(x,y,id,clr,sym,siz)
or perhaps you could use f as the grouping variable directly. I don't have Matlab open so I can try it right now.

Sign in to comment.

Tags