Improving Nelder-Mead Optimization by Genetic Algorithms and Simulated Annealing concepts
17 views (last 30 days)
I'm using the Nelder-Mead simplex algorithm for hyperparameter optimization. It works quiet well but now I would like to develop it further. I have also tried genetic algorithms and simulated annealing and I would like to incorporate techniques from these algorithms into Nelder-Mead. My goal is to overcome the problem of local minima and/or to get faster convergence.
Does somebody have some idea how to incorporate ideas from genetic algorithms or simulated annealing into Nelder-Mead? Perhaps the simplex gradient could also be used in some way. I'm grateful for every idea.
John D'Errico on 6 Jun 2016
Edited: John D'Errico on 6 Jun 2016
Long ago (when I first learned about polytope methods), I recall being told by those who were knowledgable about Nelder-Mead that chasing the gradient tended to be a bad idea. The idea is it tends to create simplexes that are long and thin. Long thin simplexes will never be robust. So these methods tend to be better running away from a bad place than they are at running towards a good place. At least they will be more robust, more stable.
As far as the various stochastic algorithms, I don't see how you would plan on combining them with a scheme like Nelder-Mead. At least that is true of genetic algorithms, which have a very different underlying algorithm.
I suppose you might try something vaguely stochastic, where the new point is chosen by some random methodology. The issue is you need to avoid creating degenerate simplexes.
If your goal is faster convergence, I don't have serious hopes. There are several problems.
1. Nelder-Mead suffers from the curse of dimensionality. By this, you need to consider how well a simplex samples a higher dimensional space. Nelder-Mead does reasonably well in low numbers of dimensions, so 2 or 3 dimensions are no problem. But see how well a simplex samples a hyper-cube. Thus we can dissect the square (a 2-d hypercube) into 2 simplexes. A simple dissection of an n-dimensional hypercube uses factorial(n) simplexes. (There are other ways to do it, but they all scale somewhat similarly, so even faster than a pure exponential.) That means an (n+1)-simplex is a poor way to sample R^n for purposes of optimization.
2. Nelder-Mead has no accelerated convergence properties near the solution. So variations on Newton's method tend to be faster than linear when near the solution.
The point of all this is if you want faster convergence, then use a better method. There are better methods to be found, although every method will have its flaws.
Could you come up with some compromise scheme? Again, in reasonably low numbers of dimensions where Nelder-Mead will not be that obscenely poor, I suppose you might try to come up with some hybrid scheme. For example, I recall fzero is a hybrid method, that tries to have fast convergence when things are going well, but is able to avoid problems by switching methods when it sees things go bad.
However, in low numbers of dimensions, a good scheme is just a multi-start method. Generate many samples that roughly cover the search space. A simple lattice or a Monte Carlo sampling is a good start. Then take the best points from those samples, and set off a rapidly convergent scheme (like Newton's method, which will be fast near a solution) from each candidate point. Apply clustering to the solutions if you wish.
The above is a rather standard methodology, although I've not done much reading about Global optimization methods for quite some years. You might be best off just using the global optimization toolbox. This is always a better approach than re-creating the wheel. At the very least, you might want to do some research on global optimization methods to learn the current state of the art.
Walter Roberson on 6 Jun 2016
simulannealbnd allows you to set a HybridFcn to be called periodically, and you can have Nelder-Mead be that hybrid method. Unfortunately, simulannealbnd() does not update its idea of where it should search based upon the results of the hybrid function: it only updates its knowledge of the global minima, so you cannot use the simplex to accelerate the search. I opened a case about this and recommended that there be an option to allow the updating of the current x, so it is in the list "for consideration" some day. In the meantime I made a private modification to my copy of the simulannealbnd routines and it has worked out relatively well for the functions I happened to test it with.
Meanwhile, over in patternsearch, if you turn CompletePoll off and set the polling next step to 'success' then you get some of the effect of a simplex search. (Note that your objective function can only be evaluated in vectorized or parallel mode if you have CompletePoll turned on, so if your function works fairly well in vectorized mode then CompletePoll might be somewhat more efficient.) Alan Weiss has posted about patternsearch a number of times and recommends it as a primary tool, but it is easy to construct surfaces that patternsearch cannot find the minima for unless you ask it to use one of the MAD* point selections algorithms (which are typically very very inefficient.)