# Understanding Gaussian Mixture Models

4 views (last 30 days)
Faraz on 23 Sep 2014
Commented: Image Analyst on 23 Sep 2014
I'm trying to understand GMM by reading the sources available online. I have achieved clustering using K-Means and was seeing how GMM would compare to K-means.
Here is what I have understood, please let me know if my concept is wrong:
GMM is like KNN, in the sense that clustering is achieved in both cases. But in GMM each cluster has their own independent mean and covariance. Furthermore k-means performs hard assignments of data points to clusters whereas in GMM we get a collection of independant gaussian distributions, and for each data point we have a probability that it belongs to one of the distributions.
To understand it better I have used MatLab to code it and achieve the desired clustering. I have used SIFT features for the purpose of feature extraction. And have used k-means clustering to initialize the values. (This is from the VLFeat documentation)
%images is a 459 x 1 cell array where each cell contains the training image
[locations, all_feats] = vl_dsift(single(images{1}), 'fast', 'step', 50); %all_feats will be 128 x no. of keypoints detected
for i=2:(size(images,1))
[locations, feats] = vl_dsift(single(images{i}), 'fast', 'step', 50);
all_feats = cat(2, all_feats, feats); %cat column wise all features
end
numClusters = 50; %Just a random selection.
% Run KMeans to pre-cluster the data
[initMeans, assignments] = vl_kmeans(single(all_feats), numClusters, ...
'Algorithm','Lloyd', ...
'MaxNumIterations',5);
initMeans = double(initMeans); %GMM needs it to be double
% Find the initial means, covariances and priors
for i=1:numClusters
data_k = all_feats(:,assignments==i);
initPriors(i) = size(data_k,2) / numClusters;
if size(data_k,1) == 0 || size(data_k,2) == 0
initCovariances(:,i) = diag(cov(data'));
else
initCovariances(:,i) = double(diag(cov(double((data_k')))));
end
end
% Run EM starting from the given parameters
[means,covariances,priors,ll,posteriors] = vl_gmm(double(all_feats), numClusters, ...
'initialization','custom', ...
'InitMeans',initMeans, ...
'InitCovariances',initCovariances, ...
'InitPriors',initPriors);
Based on the above I have means, covariances and priors. My main question is, What now? I am kind of lost now.
Also the means, covariances vectors are each of the size 128 x 50. I was expecting them to be 1 x 50 since each column is a cluster, wont each cluster have only one mean and covariance? (I know 128 are the SIFT features but I was expecting means and covariances).
In k-means I used the the MatLab command knnsearch(X,Y) which basically finds the nearest neighbour in X for each point in Y.
So how to achieve this in GMM, I know its a collection of probabilities, and ofcourse the nearest match from that probability will be our winning cluster. And this is where I am confused. All tutorials online have taught how to achieve the means, covariances values, but do not say much in how to actually use them in terms of clustering.
Thank you
##### 2 CommentsShowHide 1 older comment
Image Analyst on 23 Sep 2014