Optimization and mixture estimation, with R
Recently, one of my students asked me about optimization routines in R. He told me he that R performed well on the estimation of a time series model with different regimes, while he had trouble with a (simple) GARCH process, and he was wondering if R was good in optimization routines. Actually, I always thought that mixtures (and regimes) was something difficult to estimate, so I was a bit surprised...
Indeed, it reminded me some trouble I experienced once, while I was talking about maximum likelihooh estimation, for non standard distribution, i.e. when optimization had to be done on the log likelihood function. And even when generating nice samples, giving appropriate initial values (actually the true value used in random generation), each time I tried to optimize my log likelihood, it failed. So I decided to play a little bit with standard optimization functions, to see which one performed better when trying to estimate mixture parameter (from a mixture based sample). Here, I generate a mixture of two gaussian distributions, and I would like to see how different the mean should be to have a high probability to estimate properly the parameters of the mixture.
The log likelihood (actually, I add a minus since most of the optimization functions actually minimize functions) is
The code to generate my samples is the following,
Then I use two functions to optimize my log likelihood, with identical intial values,
Actually, since I might have identification problems, I take either or , depending whether or is the smallest parameter.
On the graph above, the x-axis is the difference between means of the mixture (as on the animated grap above). Then, the red point is the median of estimated parameter I have (here ), and I have included something that can be interpreted as a confidence interval, i.e. where I have been in 90% of my scenarios: the black vertical segments. Obviously, when the sample is not enough heterogeneous (i.e. and rather different), I cannot estimate properly my parameters, I might even have a probability that exceed 1 (I did not add any constraint). The blue plain horizontal line is the true value of the parameter, while the blue dotted horizontal line is the initial value of the parameter in the optimization algorithm (I started assuming that the mixture probability was around 0.2).
The graph below is based on the second optimization routine (with identical starting values, and of course on the same generated samples),
(just to be honest, in many cases, it did not converge, so the loop stopped, and I had to run it again... so finally, my study is based on a bit less than 500 samples (times 15 since I considered several values for the mean of my second underlying distribution), with 200 generated observations from a mixture).
The graph below compares the two (empty circles are the first algorithm, while plain circles the second one),
On average, it is not so bad.... but the probability to be far away from the tru value is not small at all... except when the difference between the two means exceeds 3...
If I change starting values for the optimization algorithm (previously, I assumed that the mixture probability was 1/5, here I start from 1/2), we have the following graph
which look like the previous one, except for small differences between the two underlying distributions (just as if initial values had not impact on the optimization, but it might come from the fact that the surface is nice, and we are not trapped in regions of local minimum).
Thus, I am far from being an expert in optimization routines in R (see here for further information), but so far, it looks like R is not doing so bad... and the two algorithm perform similarly (maybe the first one being a bit closer to the true parameter).