Simple and heuristic optimization
By arthur charpentier on Saturday, June 30 2012, 00:55 - mathematics - Permalink
This week, at the Rmetrics conference, there has been an interesting discussion about heuristic optimization. The starting point was simple: in complex optimization problems (here
we mean with a lot of local maxima, for instance), we do not necessarily need
extremely advanced algorithms that do converge extremly fast, if we
cannot ensure that they reach the optimum. Converging extremely fast,
with a great numerical precision to some point (that is not the point
we're looking for) is useless. And some algorithms might be much slower, but at least, it is much more
likely to converge to the optimum. Wherever we start from.
We have experienced that with Mathieu, while we were looking for maximum
likelihood of our MINAR process: genetic algorithm have performed extremely well. The idea is extremly simple, and natural. Let us
consider as a starting point the following algorithm,
- Start from some

- At step
, draw a point
in a neighborhood of
,
- either
then
- or
then 
,- either
then
- or
then 
. To illustrate the idea, consider the following function
values arbitrary, we have obtained the following scenarios> x0=15 > MX=matrix(NA,501,2) > MX[1,]=runif(2,-x0,x0) > k=.5 > for(s in 2:501){ + bruit=rnorm(2) + X=MX[s-1,]+bruit*3 + if(X[1]>x0){X[1]=x0} + if(X[1]<(-x0)){X[1]=-x0} + if(X[2]>x0){X[2]=x0} + if(X[2]<(-x0)){X[2]=-x0} + if(f(X[1],X[2])+k>f(MX[s-1,1], + MX[s-1,2])){MX[s,]=X} + if(f(X[1],X[2])+k<=f(MX[s-1,1], + MX[s-1,2])){MX[s,]=MX[s-1,]} +}

|
|
|
|
It does not always converge towards the optimum,
|
|
and sometimes, we just missed it after being extremely unlucky
|
|
Note that if we run 10,000 scenarios (with different random noises and starting point), in 50% scenarios, we
reach the maxima. Or at least, we are next to it, on top.
What if we compare with a standard optimization routine, like
Nelder-Mead, or quasi gradient ?Since we look for the maxima on a
restricted domain, we can use the following function,
> g=function(x) f(x[1],x[2]) > optim(X0, g,method="L-BFGS-B", + lower=-c(x0,x0),upper=c(x0,x0))$parIn that case, if we run the algorithm with 10,000 random starting point, this is where we end, below on the right (while the heuristic technique is on the left),
|
|
So here, it looks like an heuristic method works extremelly well, if do not need to reach the maxima with a great precision. Which is usually the case actually.

















Comments
In this particular case, I think you should compare with simmulated annealing : optim(..., method="SANN"), which is really close to your heuristic optimization...
Gradient methods are not adapted to this kind of objective function, since a background (rough) hypothesis of convexity is more or less integrated in the algorithm. (In fact, the convergence is guaranteed with this hypothesis).
Very interesting posting. I enjoyed the most your graphs. Could you post the code to replicate it?