Freakonometrics

To content | To menu | To search

Tag - challenge

Entries feed - Comments feed

Sunday, January 9 2011

From one extreme (0) to another (1): challenge failed, but who cares...

Just after arriving in Montréal, at the beginning of September, I discussed statistics of my blog, and said that it might be possible - or likely - that by new year's Eve, over a million page would have been viewed on my blog (from Google's counter, here). By the end of October (here) I was very optimistic, but mi-December (here) the challenge was likely to be failed. An indeed, the million page target was hit one week after, on January 8th,

base=read.table("http://freakonometrics.blog.free.fr/public/data/million1.csv",
sep="\t",header=TRUE)
X1=cumsum(base$nombre)
X0=X1
base=read.table("http://freakonometrics.blog.free.fr/public/data/million2.csv",
sep="\t",header=TRUE)
X2=cumsum(base$nombre)
X=X1+X2
 
D0=as.Date("08/11/2008","%d/%m/%Y")
D=D0+1:length(X1)
plot(D,X1,xlim=c(as.Date("08/06/2010","%d/%m/%Y"),as.Date("08/02/2011","%d/%m/%Y")),
ylim=c(800000,1050000))
abline(h=1000000,col="red")
abline(v=as.Date("01/01/2011","%d/%m/%Y"),col="red")
points(D,X,col="blue")

Again, the black points were from the previous blog (http://blogperso.univ-rennes1.fr/arthur.charpentier/) which was transferred to that new one (http://freakonometrics.blog.free.fr) this Autumn. So I just sum up the stats to get the blue points. At each date, I fit an ARIMA, and use it to make forecast the total number of pages viewed on January 1st, and calculate the probability to reach a million page viewed at that date (using a Gaussian ARIMA model). Actually, here, I changed a little bit the challenge, and asked "what would have been the probability to reach a million page viewed on January 1st, and on January 8th" ?

kt=which(D==as.Date("01/06/2010","%d/%m/%Y"))
Xbase=X
X=X1+X2
P1=P2=rep(NA,(length(X)-kt)+7)
for(h in 0:(length(X)-kt+7)){
model <- arima(X[1:(kt+h)],c(7 ,1,7),method="CSS")
forecast <- predict(model,200)
u=max(D[1:kt+h])+1:300
if(min(u)<=as.Date("01/01/2011","%d/%m/%Y")){
k=which(u==as.Date("01/01/2011","%d/%m/%Y"))
(P1[h+1]=1-pnorm(1000000,forecast$pred[k],forecast$se[k]))}
k=which(u==as.Date("08/01/2011","%d/%m/%Y"))
(P2[h+1]=1-pnorm(1000000,forecast$pred[k],forecast$se[k]))
}
The red curve is the probability to reach 1 million viewed on January 1st (as done earlier, using an ARIMA projection). The blue one is the probability to reach 1 million viewed one week after, on January 8th.

 and here is the difference between probabilities,

The flat part at the beginning of November corresponds to the bump that was observed on the initial graph. But then, the slope was too low, and in December, the challenge was failed... Obviously, looking at statistics during a blog migration is not a bright idea...

Wednesday, December 15 2010

I really need to find hot (and sexy) topics

50 days ago (here), I was supposed to be very optimistic about the probability that I could reach a million viewed pages on that blog (over a bit more than two years). Unfortunately, the wind has changed and today, the probability is quite low...

 base=read.table("millionb.csv",sep=";",header=TRUE)
X1=cumsum(base$nombre)
base=read.table("million2b.csv",sep=";",header=TRUE)
X2=cumsum(base$nombre)
X=X1+X2
D=as.Date(as.character(base$date),"%m/%d/%Y") kt=which(D==as.Date("01/06/2010","%d/%m/%Y")) D0=as.Date("08/11/2008","%d/%m/%Y") D=D0+1:length(X1) P=rep(NA,(length(X)-kt)+1) for(h in 0:(length(X)-kt)){ model <- arima(X[1:(kt+h)],c(7 1,7),method="CSS")
forecast <- predict(model,200) u=max(D[1:kt+h])+1:300 k=which(u==as.Date("01/01/2011","%d/%m/%Y")) (P[h+1]=1-pnorm(1000000,forecast$pred[k],forecast$se[k])) } plot( D[length(D)-length(P)]+1:220,c(P,rep(NA,220-length(P))), ylab="Probability to reach 1,000,000",xlab="", type="l",col="red",ylim=c(0,1))
So, I guess my posts on multiple internal rates of return, or Young's inequality will have to wait next year... I really need to find some more sexy post to attract readers.. Challenge accepted !