# How can I cluster data points according to the local minima they belong to?

3 views (last 30 days)
Sepp on 14 Jun 2016
Edited: Matt J on 15 Jun 2016
Hello
I'm using the genetic algorithm for hyperparameter optimisation. My loss function is the cross-validated loss, that means I can evaluate my loss function but I don't know how it looks like (the shape). Of course, my loss function has several local minimas.
Let's say I'm using a population size of 20, i.e. I have 20 data points in each population. Let's further assume that I have run the genetic algorithm for a certain amount of generations, so that I have my final 20 data points.
Of course it could be possible that the 20 data points lie around different local minimas.
Now, I would like to apply some sort of clustering to cluster the data points according to the local minima they seem to belong to and I would like to identify the cluster which is most promosing (which seems to contain the global minima). I'm assuming that a data point belongs to a local minima if it is in its neighbourhood.
Does somebody have an idea how this could be done?

Matt J on 14 Jun 2016
Edited: Matt J on 14 Jun 2016
I'm convinced there is no foolproof way to do it, but the following heuristic approach might work okay. The genetic algorithm is supposed to return the globally optimal solution x. For any other member y of the final population, you finely sample the objective function along the line segment [x,y]. See if the objective function is convex or maybe just pseudo-convex along this line segment. A numeric test of this would be to see if the discrete derivatives as given by diff() are all monotonically increasing. If it is convex, you assume (without certainty) that x and y are part of the same capture basin and cluster them together.
If the dimension of the problem is low enough, another approach would be to sample the objective function values on an N-dimensional grid containing the final population so that you have an N-dimensional image of these values. You could then use watershed() to find all the basins using image processing methods.
Sepp on 14 Jun 2016
Edited: Sepp on 14 Jun 2016
With your test I'm only getting two clusters, i.e. a cluster with points which seems to belong to the same local minima as x and a cluster with points that do not. But how can I create more clusters independent of x?
So you are proposing some sort of hierarchical clustering? That means first creating two clusters A and B, then dividing B into B and C, then C into C and D etc.
You can use it to calculate finite difference derivatives of the loss function along your line segments.
How can I do that? Let's say I have the two data points x = (x1,x2,x3) and y = (y1,y2,y3), i.e. each is 3D.
I think I have misunderstood your approach. I need to sample points between x and y,right? I thought that I can just use the points x and y without sampling new points. The problem with sampling new points is that it increases the number of function evaluations a lot which I don't want. x and y are both 3D, how can I sample points in between? It is not just a line (1D).
You know the loss function values of all points in the clusters, presumably. It seems reasonable to suppose that the cluster containing the best point is the basin of the global minimum.
Why? It could be possible that another cluster contains the global minimum but was just badly sampled. I only create a few generations of genetic algorithm, so it is possible that the global minima is in a completely different place than the current best point is.
Matt J on 15 Jun 2016
Edited: Matt J on 15 Jun 2016
So you are proposing some sort of hierarchical clustering? That means first creating two clusters A and B, then dividing B into B and C, then C into C and D etc.
Yes.
I think I have misunderstood your approach. I need to sample points between x and y,right? I thought that I can just use the points x and y without sampling new points.
The approach I was proposing was, for x and y and your loss function f(), construct the 1D profile function
g(t) = f(x+t*(y-x))
and sample g(t) at equi-spaced 0<=t<=1. Then look to see if g(t) is convex. If g(t) is convex and sampled into vector
G=[g(0),g(t1),...g(tN) g(1)]
then diff(G) should be monotonically increasing.
The approach was never going to work well if you're averse to sampling your loss function, but I don't think you have much hope of delineating the capture basins if you aren't prepared to do a fair amount of exploratory sampling of the graph of f().
Why? It could be possible that another cluster contains the global minimum but was just badly sampled.
But it's the best estimate of the global min. based on the population that you have. Obviously, the success probability of the test increases with the number of ga() iterations that you do. Any approach you consider will decline in success probability the more thrifty you are with iterations. Suppose in the extreme that you do no iterations at all. Then ga() will have given you no information.
One way that you can refine your estimate of the global min., however, is to look at the global min. of the g(t) profiles while you're calculating them. If you hit a g(t) lower than any point you've seen, you can dynamically revise your estimate of the global min. and its capture basin. For that matter, you can revise your population...

Alan Weiss on 14 Jun 2016
I wrote a guest post on Loren Shure's blog a while back on clumping solutions found by MultiStart. You could use the same idea here. Basically, if you have a list of N-dimensional points, ordered from best to worst according to their associated fitness function values, then use a variant of the clumpthem function to generate clumps according to the granularity that you want in function value and space.
Good luck,
Alan Weiss
MATLAB mathematical toolbox documentation