How can I cluster data points according to the local minima they belong to?
1 view (last 30 days)
I'm using the genetic algorithm for hyperparameter optimisation. My loss function is the cross-validated loss, that means I can evaluate my loss function but I don't know how it looks like (the shape). Of course, my loss function has several local minimas.
Let's say I'm using a population size of 20, i.e. I have 20 data points in each population. Let's further assume that I have run the genetic algorithm for a certain amount of generations, so that I have my final 20 data points.
Of course it could be possible that the 20 data points lie around different local minimas.
Now, I would like to apply some sort of clustering to cluster the data points according to the local minima they seem to belong to and I would like to identify the cluster which is most promosing (which seems to contain the global minima). I'm assuming that a data point belongs to a local minima if it is in its neighbourhood.
Does somebody have an idea how this could be done?
Matt J on 14 Jun 2016
Edited: Matt J on 14 Jun 2016
I'm convinced there is no foolproof way to do it, but the following heuristic approach might work okay. The genetic algorithm is supposed to return the globally optimal solution x. For any other member y of the final population, you finely sample the objective function along the line segment [x,y]. See if the objective function is convex or maybe just pseudo-convex along this line segment. A numeric test of this would be to see if the discrete derivatives as given by diff() are all monotonically increasing. If it is convex, you assume (without certainty) that x and y are part of the same capture basin and cluster them together.
If the dimension of the problem is low enough, another approach would be to sample the objective function values on an N-dimensional grid containing the final population so that you have an N-dimensional image of these values. You could then use watershed() to find all the basins using image processing methods.
More Answers (1)
Alan Weiss on 14 Jun 2016
I wrote a guest post on Loren Shure's blog a while back on clumping solutions found by MultiStart. You could use the same idea here. Basically, if you have a list of N-dimensional points, ordered from best to worst according to their associated fitness function values, then use a variant of the clumpthem function to generate clumps according to the granularity that you want in function value and space.
MATLAB mathematical toolbox documentation