Optimization and mixture estimation, with R
By arthur charpentier on Thursday, September 23 2010, 23:05 - Informatique / R - Permalink
Recently, one of my students asked me about optimization routines in R. He told me he that R performed well on the estimation of a time series model with different regimes, while he had trouble with a (simple) GARCH process, and he was wondering if R was good in optimization routines. Actually, I always thought that mixtures (and regimes) was something difficult to estimate, so I was a bit surprised...


proportional to
, and
being a parameter that will change, from 0 to 4.The log likelihood (actually, I add a minus since most of the optimization functions actually minimize functions) is
> logvraineg <- function(param, obs) {
+ p <- param[1]
+ m1 <- param[2]
+ sd1 <- param[3]
+ m2 <- param[4]
+ sd2 <- param[5]
+ -sum(log(p * dnorm(x = obs, mean = m1, sd = sd1) + (1 - p) *
+ dnorm(x = obs, mean = m2, sd = sd2)))
+ }
The code to generate my samples is the following,
>X1 = rnorm(n,0,1)
> X20 = rnorm(n,0,1)
> Z = sample(c(1,2,2),size=n,replace=TRUE)
> X2=m+X20
> X = c(X1[Z==1],X2[Z==2])
Then I use two functions to optimize my log likelihood, with identical intial values,
> O1=nlm(f = logvraineg, p = c(.5, mean(X)-sd(X)/5, sd(X), mean(X)+sd(X)/5, sd(X)), obs = X)
> logvrainegX <- function(param) {logvraineg(param,X)}
> O2=optim( par = c(.5, mean(X)-sd(X)/5, sd(X), mean(X)+sd(X)/5, sd(X)),
+ fn = logvrainegX)
Actually, since I might have identification problems, I take either
or
, depending whether
or
is the smallest parameter.
), and I have included something that can be interpreted as a confidence interval, i.e. where I have been in 90% of my scenarios: the black vertical segments. Obviously, when the sample is not enough heterogeneous (i.e.
and
rather different), I cannot estimate properly my parameters, I might
even have a probability that exceed 1 (I did not add any constraint).
The blue plain horizontal line is the true value of the parameter, while the blue
dotted horizontal line is the initial value of the parameter in the
optimization algorithm (I started assuming that the mixture probability
was around 0.2).The graph below is based on the second optimization routine (with identical starting values, and of course on the same generated samples),

The graph below compares the two (empty circles are the first algorithm, while plain circles the second one),

If I change starting values for the optimization algorithm (previously, I assumed that the mixture probability was 1/5, here I start from 1/2), we have the following graph

Thus, I am far from being an expert in optimization routines in R (see here for further information), but so far, it looks like R is not doing so bad... and the two algorithm perform similarly (maybe the first one being a bit closer to the true parameter).







Comments
Hi Arthur,
you do realise that this is an ill-posed optimisation problem, because the likelihood diverges to +infinity?
Also, the log-likelihood has typically many local modes, so you results depends mostly on your choice of a starting point...
For mixtures, the standard approach for calculating the MLE is to use EM, not a standard optim procedure.
All the best,
Nicolas Chopin
REPONSE: hi Nicolas... I agree, but the point is that I have seen many people using that (in practice).... And so far, it is not that bad actually.... So I guess the best would be to see how an EM algorithm performs one the same kind of samples... ok, I'll start to work on that new post
Merci pour votre billet! C'est très apprécié! En fin de compte, pour l'estimation numérique d'un modèle GARCH en minimisant la fonction de logvraisemblance x -1, il n'y a pas de problème avec R. C'était moi qui avait mal tenu compte des contraintes (donc ce n'était pas R qui était poche, mais plutôt moi!). On peut utiliser "nlminb" ou "constrOptim" qui sont des optimisateurs avec lesquels on peut spécifier des contraintes et ils fonctionnent bien dans le cas GARCH. En général, "nlminb" était plus rapide mais à quelques rares occasions ne trouvait pas le minimum absolu, tandis que "constrOptim" convergeait toujours vers la bonne solution.
Je vais essayer de fitter un modèle RS-GARCH et je vous donnerai des nouvelles si une optimisation avec EM est nécessaire ou non pour l'optimisation.