...it is the end of a World. This blog is now officially dead. See you from now on on my new blog,
Friday, December 21 2012
The Maya prediction was correct...
By arthur charpentier on Friday, December 21 2012, 00:41  Personal
Sunday, December 9 2012
Blog migration
By arthur charpentier on Sunday, December 9 2012, 12:56  personal
The blog is currently migrating, from
http://freakonometrics.blog.free.fr/
to the hypotheses.org plateform to
http://freakonometrics.hypotheses.org/
The IT is working on it, and if I still publish on the two blogs until the end of this session, from 2013, no more posts on this blog. Please, update your bookmarks...
Thursday, November 29 2012
Save R objects, and other stuff
By arthur charpentier on Thursday, November 29 2012, 09:08  ACT6420A2012
Yesterday, Christopher asked me how to store an R object, to save some time, when working on the project.
First, download the csv file for searches related to some keyword, via http://www.google.com/trends/, for instance "sunglasses". Recall that csv files store tabular data (numbers and text) in plaintext form, with commaseparated values (where csv term comes from). Even if it is not a single and welldefined format, it is a text file, not an excel file!
Recherche sur le Web : intérêt pour sunglasses
Dans tous les pays; De 2004 à ce jour
Intérêt dans le temps
Semaine,sunglasses
20040104  20040110,48
20040111  20040117,47
20040118  20040124,51
20040125  20040131,50
20040201  20040207,52
(etc) The file can be downloaded from the blog,
> report=read.table( + "http://freakonometrics.blog.free.fr/public/data/glasses.csv", + skip=4,header=TRUE,sep=",",nrows=464)
Then, we just run a function to transform that data frame. It can be found from a source file
> source("http://freakonometrics.blog.free.fr/public/code/H2M.R")
In this source file, there is function that transforms a weekly series into a monthly one. The output is either a time series, or a numeric vector,
> sunglasses=H2M(report,lang="FR",type="ts")
Here, we asked for a time series,
> sunglasses Jan Feb Mar Apr 2004 49.00000 54.27586 66.38710 80.10000 2005 48.45161 58.25000 69.93548 80.06667 2006 49.70968 57.21429 67.41935 82.10000 2007 47.32258 55.92857 70.87097 84.36667 2008 47.19355 54.20690 64.03226 79.36667 2009 45.16129 50.75000 63.58065 76.90000 2010 32.67742 44.35714 58.19355 70.00000 2011 44.38710 49.75000 59.16129 71.60000 2012 43.64516 48.75862 64.06452 70.13333 May Jun Jul Aug 2004 83.77419 89.10000 84.67742 73.51613 2005 83.06452 91.36667 89.16129 76.32258 2006 86.00000 92.90000 93.00000 72.29032 2007 86.83871 88.63333 84.61290 72.93548 2008 80.70968 80.30000 78.29032 64.58065 2009 77.93548 70.40000 62.22581 51.58065 2010 71.06452 73.66667 76.90323 61.77419 2011 74.00000 79.66667 79.12903 66.29032 2012 79.74194 82.90000 79.96774 69.80645 Sep Oct Nov Dec 2004 56.20000 46.25806 44.63333 53.96774 2005 56.53333 47.54839 47.60000 54.38710 2006 51.23333 46.70968 45.23333 54.22581 2007 56.33333 46.38710 44.40000 51.12903 2008 51.50000 44.61290 40.93333 47.74194 2009 37.90000 30.38710 28.43333 31.67742 2010 50.16667 46.54839 42.36667 45.90323 2011 52.23333 45.32258 42.60000 47.35484 2012 54.03333 46.09677 43.45833
that we can plot using
> plot(sunglasses)
Now we would like to store this time series. This is done easily using
> save(sunglasses,file="sunglasses.RData")
Next time we open R, we just have to use
> load("/Users/UQAM/sunglasses.Rdata")
to load the time series in R memory. So saving objects is not difficult.
Last be not least, for the part on seasonal models, we will be using some functions from an old package. Unfortunately, on the CRAN website, we see that
but nicely, files can still be found on some archive page. On Linux, one can easily install the package using (in R)
> install.packages( + "/Users/UQAM/uroot_1.41.tar.gz", + type="source")
With a Mac, it is slightly more complicated (see e.g. Jon's blog): one has to open a Terminal and to type
R CMD INSTALL /Users/UQAM/uroot_1.41.tar.gz
On Windows, it is possible to install a package from a zipped file: one has to download the file from archive page, and then to spot it from R.
The package is now installed, we just have to load it to play with it, and use functions it contains to tests for cycles and seasonal behavior,
> library(uroot) > CH.test function (wts, frec = NULL, f0 = 1, DetTr = FALSE, ltrunc = NULL) { s < frequency(wts) t0 < start(wts) N < length(wts) if (class(frec) == "NULL") frec < rep(1, s/2) if (class(ltrunc) == "NULL") ltrunc < round(s * (N/100)^0.25) R1 < SeasDummy(wts, "trg") VFEalg < SeasDummy(wts, "alg")
(etc)
Wednesday, November 28 2012
D.I.Y. strategy, and why academics should blog!
By arthur charpentier on Wednesday, November 28 2012, 17:54  personal
Last week, I went to the Econometrics seminar of Montréal, at UdM, where Alfred Galichon was giving a great talk on marriage market. Alfred is a former colleague (from France), a coauthor, an amazing researcher, and above all, a friend of mine. And he has always be supportive about my blogging activities. So while we were having lunch, after the seminar, Alfred mentioned my blogging activity to the other researchers. I should say researchers in Econometrics (yes, with a capital E, since it is a Science, as mention in an old paper by David Hendry by the end of the 70's). Usually, when I am involved in this kind of meeting, I start with some apologies, explaining that I do like theoretical econometrics (if not, I would not come to the seminar), but I do like my freakonometrics activity. I do like to use econometrics (or statistical techniques) to figure out (at least to try) why some things works the way they do. I try to find data, and then try to briefly analyze them to answer some simple questions. Or sometime, I just run simulations to answer more theoretical questions (or at least to give clues).
But above all, I like the fact that blogging gives me the opportunity to interact with people I would never meet without this activity. For instance, last May, I was discussing (on Twitter) with @coulmont, @joelgombin and @imparibus about elections in France. Then @coulmont asked me "yes, everyone knows that there should be some ecological fallacies behind my interpretation, but I am not so sure since I have data with a small granularity. As an econometrician, what do you think ?" Usually, I hate having a label, like "... I ask you since you're a mathematician", or "as an economist, what do you think of...". Usually, when people ask me economic questions, I just claim being a mathematician, and viceversa. But here, I even put on the front of my blog the word "econometrics" (more or less). So here, I could not escape... And the truth is, that while I was a student, I never heard anything about this "ecological fallacy". Neither did I as a researcher (even if I have been reading hundreds of econometric articles, theoretical and applied). Neither did I as a professor (even if I have been teaching econometrics for almost ten years, and I have read dozens textbooks to write notes and handouts). How comes ? How come researchers in sociology and in political sciences know things in econometrics that I have never heard about ?
The main reason  from my understanding  is the following: if everyone talks about "interdisciplinarity" no one (perhaps a few) is really willing to pay the price of working on different (not to say many) areas. I tried, and trust me, I found it difficult. It is difficult to publish a paper in a climate journal when you're not specialist in climate (and you just want to give your opinion as a statistician). It is difficult to assume that you might waste weeks (not to say months) reading articles in geophysics if you want to know more about earthquakes risks, going to seminars, etc. Research is clearly a club ("club" as defined in Buchanan (1965)) story.
This week, I planned to go to some journal club in biology and physics, at McGill (kindly, a colleague there invited me, but we got a time misunderstanding)... this has nothing to do with my teaching, nor with my research activities. But I might learn something ! Yes, I do claim that I am paid just to have fun, to read stuff that I do find interesting, trying to understand the details of a proof, trying to understand how data were obtained. In most cases, it might (and should) be a complete waste of time, since I will not publish anything (anything serious, published in some peer reviewed journal) on that topic... but should I really care ? As I explained earlier (in French), I do also claim that I have a moral obligation to return everything I have seen, heard, read. And since I am not a big fan of lectures (and that I do not think I have skills for that) I cannot give my opinion, neither on economics facts (as @adelaigue or @obouba might do on their blogs) or on science results (as @tomroud does). But I think I can help on modeling and computational issues. My point being: never trust what you read (even on my blog) but please, try to do it yourself! You read that "90% of French executive think about expatriation" (as mentioned here)? Then try to find some data that should confront that statement. And see if you come up with the same conclusion... And since it might be a bit technical sometimes, here are some lines of code, to do it on your own... Academics have a legitimacy when they give their opinions on technical issues. At least they can provide with a list of reference everyone should read to get an overview of the topic. But academics can also help people read graphs, or data. To give them "numeracy" (or a culture in numbers) necessary to understand complex issues.
To conclude, I should mention that I understood what this "ecological fallacy" was from Thomsen (1987) and many more documents could be found on Soren Thomsen's page http://www.mit.ps.au.dk/srt/. But I got most of the information I was looking for from a great statistician, who happens to be also an amazing blogger: Andrew Gelman (see http://andrewgelman.com/). I will probably write a post someday about this, since I found the question extremely interesting, and important.
Sunday, November 18 2012
In less than 50 days: 2013, year of statistics
By arthur charpentier on Sunday, November 18 2012, 19:48  Statistics
A couple of days ago, Jeffrey sent me an email, encouraging me to write about the international year of statistics, hoping that I would participate to the event. The goals of Statistics2013 are (as far as I understood) to increase public awareness of the power and impact of statistics on all aspects of society, nuture Statistics as a profession and promote creativity and development in Statistics.
So I will try to write more posts (yes, I surely can try) on statistical methodology, with sexy applications... So see you in a few days on this blog, to celebrate statistics ! I can even invite guests to write here... all contributors are welcome !
Saturday, November 17 2012
Somewhere else, part 21
By arthur charpentier on Saturday, November 17 2012, 22:59  Links
A very interesting article, in Scientific American,
 "Children of scientists may inherit genes that not only confer intellectual talents but also predispose them to autism" http://www.scientificamerican.com/a=aregeekycouplesmorelikelytohavekidswithautism …
 “Communication
is a central task of statistics”...http://andrewgelman.com/communicationisacentraltaskofstatisticsandideallyastateoftheartdataanalysiscanhavestateoftheartdisplaystomatch/ … on
@StatModeling's (reference) blog
 Some Notes on Schelling's Essay (1972, http://jasss.soc.surrey.ac.uk/Schelling_Th_C_1972._On_Letting_a_Computer_Help_with_the_Work._JASSS.pdf …) "On Letting a Computer Help with the Work" http://jasss.soc.surrey.ac.uk

"Why
are observations of inflation so biased? And biased by gender?" http://marginalrevolution.com/tion/2012/11/whyareobservationsofinflationsobiasedandbiasedbygender.html … via
@adelaigue see http://www.clevelandfed.org/resntary/2001/1101.pdf… 
"How
To Predict The Future" by
@hertling http://www.feld.com/wp/archives/2012/06/howtopredictthefuture.html … 
"Women
as Academic Authors, 16652010" http://chronicle.com/artasAcademicAuthors/135192/ …
 "Be persuasive. Be brave. Be arrested (if necessary)"http://www. nature.com/news/be… "More scientists must speak out...."
 "Hurricane Sandy’s huge size: freak of nature or climate change?"http://classic.wunderground.com/JeffMasters/comment.html?entrynum=2293 …

"The
Moral of Sandy" by
@BjornLomborg http://www.projectsyndicate.org/therightandwrongwaytoreducehurricanedamagebybjrnlomborg …  "How to give feedback" http://timharford.com/howtogivefeedback/?utm_source=dlvr.it&utm_medium=twitter …
 "What is Twitter, a Social Network or a News Media?" http://product.ubion.co.kr/up142222731/ccres00056/db/_2250_1/embedded/2010wwwtwitter.pdf …
 "Trends in Social Media : Persistence and Decay" http://arxiv.org/1402
 Stan Nikolov’s master’s thesis on "detecting twitter trends before they happen" http://web.mit.edu/snik…
 "Popularity versus similarity in growing networks" http://arxiv.org/0286

"Twitter
Event Networks and the Superstar Model"http:// arxiv.org/3090 via
@renaudjf  "How Long Will a Lie Last? Study Finds That False Memories Linger for Years" http://blogs.scientificamerican.com/g…
 "Science is enforced humility" http://www.guardian.co.uk/012/nov/13/scienceenforcedhumility …"science compels its practitioners to confront their own fallibility"

"The
Uncertain Future for Universities"http://conversableeconomist.blogspot.com/theuncertainfutureforuniversities.html … via
@adelaigue  "No more magical thinking" by David Remnick http://www.newyorker.com/…
 "Are All Units Created Equal? The Effect of Default Units on Product Evaluations" http://users.ugent.be/~mapandel/…
 "Child’s Education, but Parents’ Crushing Loans"http://www.nytimes.com/usiness/someparentsshoulderingstudentloansfallontoughtimes.html?hpw…

"Should
Scientists and Engineers Resist Taking Military Money?" by
@Horganism on http://blogs.scientificamerican.com/cro12/11/12/shouldscientistsandengineersresisttakingmilitarymoney/ … via@BoraZ  "On the sciencebased communication of risks following the recent sentencing of Italian scientists" http://www.alphagalileo.org/Asset…
 "Data, Dimensions and Geometry oh my !"http:// geomblog.blogspot.ca/datadimensionsandgeometryohmy.html …
 "Breaking down the Presidential vote"http:// blogs.suntimes.com/pol11/graphics_breaking_down_the_presidential_vote.html …

"512
Paths to the White House" http://source.mozillaopennews.org/en/nyts512pathswhitehouse/ … "How we made the
interactive D3 decision tree" via
@alignedleft  "Inside the Secret World of the Data Crunchers Who Helped Obama" http://swampland.time.com/nsidethesecretworldofquantsanddatacruncherswhohelpedobamawin/#ixzz2ByVOqVv1 …
 "is nate silver’s win sociology’s loss?"http:// scatter.wordpress.com/isnatesilverswinsociologysloss/ … & "He does it again! Will pundits finally accept defeat?" http://simplystatistics.org/81/natesilverdoesitagainwillpunditsfinallyaccept …
Wednesday, November 14 2012
I think I've already heard that tune...
By arthur charpentier on Wednesday, November 14 2012, 21:25  Statistics
Maybe I just wanted to talk about Vince Guaraldi in Movember (let's try this challenge for this month: mention only people wearing a moustache). This weekend, the CD of Charlie Brown and Chrismas (see e.g. on youtube) was playing at home (I know, it is a bit early). And during the second track, I said to myself "I know that tune". For some reasons, 5 or 6 notes reminded me of a song of Jacques Brel... What are the odds I thought (at first) ?
The difficult point is that calculating the probability that a given sequence appears over some runs might be compiicated (I am not that good when it is time to compute probabilities). But it is still possible to run some codes to estimate the probability to get a sequence over a fixed number of runs. For instance, consider the case there where there are only two notes (call them "H" and "T", just to illustrate).
> runappear=function(seqA,nrun){ + win=FALSE + seq=sample(c("H","T"),size= + length(seqA)1,replace=TRUE) + i=length(seqA)1 + while((win==FALSE)&(i<nrun)){ + i=i+1 + seq=c(seq,sample(c("H","T"),size=1)) + n=length(seq) + if(all(seq[(nlength(seqA)+1):n]==seqA)){ + win=TRUE}} +return(win)}
For instance, we can generate 100,000 samples to see how many times we have the tune (call it a tune) "THT" over 10 consecutive notes
> runappear(c("T","H","T"),10) [1] TRUE > simruns=function(seqA,nrun,sim){ + s=0 + for(i in 1:sim){s=s+runappear(seqA,nrun)}; + return(s/sim)} > simruns(c("T","H","T"),10,100000) [1] 0.65684
But the most interesting point (from my point of view...) about those probabilities is that it yields a nontransitive game. Consider two players. Those two players  call them (A) and (B)  select tunes (it has to be three notes). If (A)'s tune appears before (B)'s, them (A) wins. The funny story is that there is no "optimal" strategy. More specificaly, if (B) knows what (A) has chosen, then it is always possible for (B) to choose a tune that will appear before (A)'s with more than 50% chance. Odd isn't it. This is Penney's game, as introduced by Walter Penney, see e.g. Nishiyama and Humble (2010).
It is possible to write a code where we run scenarios until one player wins,
> winner=function(seqA,seqB){ + win="N" + seq=sample(c("H","T"),size=2,replace=TRUE) + while(win=="N"){ + seq=c(seq,sample(c("H","T"),size=1)) + n=length(seq) + if(all(seq[(n2):n]==seqA)){win="A"} + if(all(seq[(n2):n]==seqB)){win="B"}}; + return(win)} > A=c("H","H","H");B=c("H","T","H") > winner(A,B) [1] "A"
If we run one thousand games, we see here that (B) wins more frequently (with those tunes)
> game=function(seqA,seqB,n=1000){ + win=rep(NA,n) + for(i in 1:n){win[i]=winner(seqA,seqB)} + return(table(win)/n) + } > game(A,B) win A B 0.403 0.597
Let us look now at all possible tunes (8 if we consider a sequence of 3 notes), for players (A) and (B),
> A1=c("T","T","T") > A2=c("T","T","H") > A3=c("T","H","T") > A4=c("H","T","T") > A5=c("T","H","H") > A6=c("H","T","H") > A7=c("H","H","T") > A8=c("H","H","H") > B=data.frame(A1,A2,A3,A4,A5,A6,A7,A8) > PROB=matrix(NA,8,8) > for(i in 1:8){ + for(j in 1:8){ + PROB[i,j]=game(B[,i],B[,j],1000)[1] + }}
We have the following probabilities (that (A) wons over (B))
> PROB [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [1,] 1.000 0.529 0.414 0.137 0.411 0.437 0.317 0.496 [2,] 0.493 1.000 0.674 0.242 0.664 0.625 0.480 0.701 [3,] 0.615 0.331 1.000 0.516 0.492 0.503 0.365 0.583 [4,] 0.885 0.735 0.518 1.000 0.494 0.513 0.339 0.580 [5,] 0.603 0.342 0.487 0.500 1.000 0.465 0.762 0.862 [6,] 0.602 0.360 0.527 0.506 0.507 1.000 0.329 0.567 [7,] 0.695 0.511 0.642 0.664 0.288 0.635 1.000 0.510 [8,] 0.502 0.279 0.428 0.382 0.117 0.406 0.487 1.000
It is possible to visualize the probabilities on the graph below
with below, the optimal strategy, that (B) should choose, given the tune chosen by (A),
(actually it is possible to compute explicitely those probabilites). Here, it is possible for (B) to choose a tune so that the probability that (A) wins is less than 50% (even less than 35%).
But we are a bit far away from the problem mentioned earlier... Furthermore, of course, assuming independence and randomness (in music) is extremely simplistic (see a short discussion here, a few weeks ago, in French). We might expected some Markov chain behind, in order to have something which will not hurt our ears, as stated in Brooks et al. (1957) for instance (quite a long time ago actually).
To conclude, about Jacques Brel's song: actually, Amsterdam is inspired (somehow) from Greensleeves, the traditional English folk song. And Vince Guaraldi played it on that CD...
So yes, there is a connection between all those songs. And no need to do maths to get it...
Tuesday, November 13 2012
On BoxCox transform in regression models
By arthur charpentier on Tuesday, November 13 2012, 15:56  ACT6420A2012
A few days ago, a former student of mine, David, contacted me about BoxCox tests in linear models. It made me look more carefully at the test, and I do not understand what is computed, to be honest. Let us start with something simple, like a linear simple regression, i.e.
Let us introduced  as suggested in Box & Cox (1964)  the following family of (power) transformations
on the variable of interest. Then assume that
As mentioned in Chapter 14 of Davidson & MacKinnon (1993)  in French  the loglikelihood of this model (assuming that observations are independent, with distribution ) can be written
We can then use profilelikelihood techniques (see here) to derive the optimal transformation.
This can be done in R extremely simply,
> library(MASS) > boxcox(lm(dist~speed,data=cars),lambda=seq(0,1,by=.1))
we then get the following graph,
If we look at the code of the function, it is based on the QR decomposition of the matrix (since we assume that is a fullrank matrix). More precisely, where is a matrix, is a orthonornal matrix, and is a upper triangle matrix. It might be convenient to use this matrix since, for instance, . Thus, we do have an upper triangle system of equations.
> X=lm(dist~speed,data=cars)$qr
The code used to get the previous graph is (more or less) the following,
> g=function(x,lambda){ + y=NA + if(lambda!=0){y=(x^lambda1)/lambda} + if(lambda==0){y=log(x)} + return(y)} > n=nrow(cars) > X=lm(dist~speed,data=cars)$qr > Y=cars$dist > logv=function(lambda){ + n/2*log(sum(qr.resid(X, g(Y,lambda)/ + exp(mean(log(Y)))^(lambda1))^2))} > L=seq(0,1,by=.05) > LV=Vectorize(logv)(L) > points(L,LV,pch=19,cex=.85,col="red")
As we can see (with those red dots) we can reproduce the R graph. But it might not be consistent with other techniques (and functions described above). For instance, we can plot the profile likelihood function,
> logv=function(lambda){ + s=summary(lm(g(dist,lambda)~speed, + data=cars))$sigma + e=lm(g(dist,lambda)~speed,data=cars)$residuals + n/2*log(2 * pi)n*log(s).5/s^2*(sum(e^2))+ + (lambda1)*sum(log(Y)) + } > L=seq(0,1,by=.01) > LV=Vectorize(logv)(L) > plot(L,LV,type="l",ylab="") > (maxf=optimize(logv,0:1,maximum=TRUE)) $maximum [1] 0.430591 $objective [1] 197.6966 > abline(v=maxf$maximum,lty=2)
The good point is that the optimal value of is the same as the one we got before. The only problem is that the axis has a different scale. And using profile likelihood techniques to derive a confidence interval will give us different results (with a larger confidence interval than the one given by the standard function),
> ic=maxf$objectiveqchisq(.95,1) > #install.packages("rootSolve") > library(rootSolve) > f=function(x)(logv(x)ic) > (lower=uniroot(f, c(0,maxf$maximum))$root) [1] 0.1383507 > (upper=uniroot(f, c(maxf$maximum,1))$root) [1] 0.780573 > segments(lower,ic,upper,ic,lwd=2,col="red")
Actually, it possible to rewrite the loglikelihood as
(let us just get rid of the constant), where
Here, it becomes
> logv=function(lambda){ + e=lm(g(dist,lambda)~speed,data=cars)$residuals + elY=(exp(mean(log(Y)))) + n/2*log(sum((e/elY^lambda)^2)) + } > > L=seq(0,1,by=.01) > LV=Vectorize(logv)(L) > plot(L,LV,type="l",ylab="") > optimize(logv,0:1,maximum=TRUE) $maximum [1] 0.430591 $objective [1] 47.73436
with again the same optimal value for , and the same confidence interval, since the function is the same, up to some additive constant.
So we have been able to derive the optimal transformation according to BoxCox transformation, but so far, the confidence interval is not the same (it might come from the fact that here we substituted an estimator to the unknown parameter .
Saturday, November 10 2012
Somewhere else, part 20
By arthur charpentier on Saturday, November 10 2012, 21:51  Links
So, Movember finally arrived (see http://ca.movember.com/). So far, not a lot of articles about moustaches. But I should find some by the end of the month! Nevertheless, I discoverd a great post for those wo used to be addicted players
 "Millions of hours have been lost through people playing Tetris. It's a simple game, so why we find it so compelling?"http:// mindhacks.com/bbcfuturecolumnthepsychologyoftetris/ …
 Drew Linzer's http://votamatic.org/ "The stats man who predicted Obama's win" via http://www.bbc.co.uk/news/magazine20246741 …
 "Mapping Racist Tweets in Response to President Obama's Reelection" http://www.floatingsheep.org/mappingracisttweetsinresponseto.html …

"moving
from polls to forecasts" http://punkrockor.wordpress.com/electiondayblogpostmovingfrompollstoforecasts/ … on
@lamclay's great blog (following http://simplystatistics.org/539704/onweatherforecastsnatesilverandthe … )  for those who still want to understand how to play with data http://fivethirtyeight.blogs.nytimes.com/methodology/ for the description of Nate Silver's methodology
 US elections, on http://gianlubaio.blogspot.ca/gotcha.html … from the model described here http://gianlubaio.blogspot.co.uk/bayesforpresident.html …
 "Highly Unscientific Ways of Predicting the Next President" (but who cares) http://www.wired.com/underwire/2/11/unscientificelectionpredictions/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+wired%2Findex+%28Wired%3A+Top+Stories%29 …

en
francais, sur le blog de
@tomroud "5 leçons scientifiques du succès de Nate Silver" http://tomroud.cafesciences.org/2… ou "Qui va gagner l'élection présidentielle américaine?" par@adelaigue http://blog.francetvinfo.fr/classeeco/22/11/03/quivagagnerlelectionpresidentielleamericaine.html …  US elections... "Color might have been a decent option here" http://ilovecharts.tumblr.com/224813/colormighthavebeenadecentoptionhere …
 “The Collapse of the Soviet Union and the Productivity of American Mathematicians” http://qje.oxfordjournals.org/co…

"Did
the sun just explode? The last Dutch Book you’ll ever make" http://bayesianbiologist.com/didthesunjustexplodethelastdutchbookyoullevermake/ … the answer of
@CJBayesian to http://xkcd.com/1132/  "Why academic publishing is like a coffee shop" http://blogs.lse.ac.uk/impac… "An enormous mystique adds relatively little"

via
@sciencegoddess "How to devise passwords that drive hackers away" http://www.nytimes.com/technology/personaltech/howtodevisepasswordsthatdrivehackersaway.html?_r=0 …  "Word diffusion and climate science" http://www.plosone.org/artiAdoi%2F10.1371%2Fjournal.pone.0047966 …
 "Models must be simpler than the phenomena they are supposed to model" http://www.farnamstreetblog.com/theprincipleofincompleteknowledge/ …
 "Economics and Natural Disasters"http:// conversableeconomist.blogspot.ca/eicsandnaturaldisasters.html …

"Mathematically Challenging Bagels" by Robert Krulwichhttp://www. npr.org/krulwich/2012/11/08/164682556/mathematicallychallengingbagels?ft=1&f=1007 …
 "we no longer need expensive publishing networks" http://www.guardian.co.uk/highereduonnetwork/blog/2012/nov/08/openaccessacademicpublishingmodels …

"How
Dark Sky work" http://journal.darkskyapp.com/howdarkskyworks/ … via
@therealprotonk 
"Why
supervisors should continue measuring financial risks – the fallacy of
simple rules" http://www.voxeu.org/whysupervisorsshouldcontinuemeasuringfinancialrisksfallacysimplerules … via
@blogizmo 
"A
MathFree Guide to the Math of Alice in Wonderland" http://io9.com/amath+freeguidetothemathofaliceinwonderland … via
@centerofmath  "How Twitter language reveals your gender" http://bostonglobe.com/ide… "Social media is giving linguists new insight into how speech varies"
 "Supermarket banking" http://coppolacomment.blogspot.co.uk/supermarketbanking.html … "Problems didn't arise from proximity to investment banking; they came from the retail sector" see also "On Being The Right Size" (still talking about banks) http://b.rw/TPSh78
 "Tweeting out loud" http://blogs.lse.ac.uk/impactofsocialsciences/2012/10/16/fullicktweetingoutloud/ … "ethics, knowledge and social media in academe"
 "Are You a Good Econometrician? No, I am British" http://economistsview.typepad.com/economistsview/2012/11/maurizioboviareyouagoodeconometriciannoiambritishwitharesponsefromgeorgeevans.html?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+EconomistsView+%28Economist%27s+View%29…
 nice discussion on "Dumb econometrics questions/bleg on forecast probabilities" http://worthwhile.typepad.com/worthwhile_canadian_initi/2012/11/dumbeconometricsquestionsblegonforecastprobabilities.html … following a question asked by Nick Rowe

"Climate
policy: Do economists all favour a carbon tax?" via
@stephenfgordon http://www.economist.com/freeexchange/2011/09/climatepolicy?fsrc=scn/tw_ec/do_economists_all_favour_a_carbon_tax_ … 
"Economics
is a serious and difficult subject"http://www. tandfonline.com/doi/0/1350178X.2012.714143?journalCode=rjec20 … via
@phnk 
on
"Market Noise and Signal " http://derekhernquist.com/marketnoiseandsignal/ …by
@derekhernquist 
"A
compulsory register of trials could give a more accurate view of
studies and test results" http://www.ft.com/intl/cm795fbd622f211e2938d00144feabdc0.html?ftcamp=published_links/rss/arts_columnists_timharford/feed//product … by
@TimHarford
 "En France, 85% des députés sont cumulards. En Angleterre, ils sont 13%. En Italie, 16%. Espagne? 15%. Belgique? Idem…" http://blogs.univpoitiers.fr/oboubaolga/…

"Le
rapport Gallois est la réponse. Quelle était la question, au fait?" par
@adelaigue http://blog.francetvinfo.fr/classeeco/2/11/06/lerapportgalloisestlareponsequelleetaitlaquestionaufait.html …  "Depuis que les frais de scolarité ont triplé, le nombre d’inscrits à l’université a chuté de 15 % en Angleterre" http://etudiant.lefigaro.fr/etudier…
 "La politique économique est un sport de combat..." http://www. atterres.org/lapolitique%C3%A9conomiqueestunsportdecombat …
 [free book] "Eléments de statistique" https://studies2.hec.fr/jahite/hec/shared/sites/stoltz/acces_anonyme/ESM201213/ElementsStatMaths201213Tome1Web.pdf …("Pour citoyens d'aujourd'hui et managers de demain") par Gilles Stoltz
Friday, November 9 2012
On pvalues
By arthur charpentier on Friday, November 9 2012, 06:50  ACT6420A2012
Saturday, November 3 2012
Normality versus goodnessoffit tests
By arthur charpentier on Saturday, November 3 2012, 14:34  Statistics
In many cases, in statistical modeling, we would like to test whether the underlying distribution from an i.i.d. sample lies in a given (parametric) family, e.g. the Gaussian family
Consider a sample
> library(nortest) > X=rnorm(n)
Then a natural idea is to use goodness of fit tests (natural is not necessarily correct, we'll get back on that later on), i.e.
for some and . But since those two parameters are unknown, it is not uncommon to see people substituting estimators to those two unknown parameters, i.e.
Using KolmogorovSmirnov test, we get
> pn=function(x){pnorm(x,mean(X),sd(X))}; > P.KS.Norm.estimated.param= + ks.test(X,pn)$p.value
But since we choose parameters based on the sample we use to run a goodness of fit test, we should expect to have troubles, somewhere. So another natural idea is to split the sample: the first half will be used to estimate the parameters, and then, we use the second half to run a goodness of fit test (e.g. using KolmogorovSmirnov test)
> pn=function(x){pnorm(x,mean(X[1:(n/2)]), + sd(X[1:(n/2)]))} > P.KS.Norm.out.of.sample= + ks.test(X[(n/2+1):n],pn)$p.value>.05)
As a benchmark, we can use Lilliefors test, where the distribution of KolmogorovSmirnov statistics is corrected to take into account the fact that we use estimators of parameters
> P.Lilliefors.Norm= + lillie.test(X)$p.value
Here, let us consider i.i.d. samples of size 200 (100,000 samples were generated here). The distribution of the value of the test is shown below,
In red, the Lilliefors test, where we see that the correction works well: the value is uniformly distributed on the unit inteval. There is 95% chance to accept the normality assumption if we accept it when the value exceeds 5%. On the other hand,
 with KolmogorvSmirnov test, on the overall sample, we always accept the normal assumption (almost), with a lot of extremely large values
 with KolmogorvSmirnov test, with out of sample estimation, we actually observe the opposite: in a lot of simulation, the value is lower then 5% (with the sample was from a sample).
The cumulative distribution function of the value is
I.e., the proportion of samples with value exceeding 5% is 95% for Lilliefors test (as expected), while it is 85% for the outofsample estimator, and 99.99% for KolmogorovSmirnov with estimated parameters,
> mean(P.KS.Norm.out.of.sample>.05) [1] 0.85563 > mean(P.KS.Norm.estimated.param>.05) [1] 0.99984 > mean(P.Lilliefors.Norm>.05) [1] 0.9489
So using KolmogorovSmirnov with estimated parameters is not good, since we might accept too often. On the other hand, if we use this technique with two samples (one to estimate parameter, one to run goodness of fit tests), it looks much better ! even if we reject too often. For one test, the rate of first type error is rather large, but for the other, it is the rate of second type error...
Wednesday, October 31 2012
Why pictures are so important when modeling data?
By arthur charpentier on Wednesday, October 31 2012, 21:48  ACT6420A2012
Call: lm(formula = y1 ~ x1) Coefficients: Estimate Std. Error t value Pr(>t) (Intercept) 3.0001 1.1247 2.667 0.02573 * x1 0.5001 0.1179 4.241 0.00217 **  Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.237 on 9 degrees of freedom Multiple Rsquared: 0.6665, Adjusted Rsquared: 0.6295 Fstatistic: 17.99 on 1 and 9 DF, pvalue: 0.00217
> apply(anscombe[,1:4],2,mean) x1 x2 x3 x4 9 9 9 9 > apply(anscombe[,1:4],2,var) x1 x2 x3 x4 11 11 11 11
> apply(anscombe[,5:8],2,mean) y1 y2 y3 y4 7.500909 7.500909 7.500000 7.500909 > apply(anscombe[,5:8],2,var) y1 y2 y3 y4 4.127269 4.127629 4.122620 4.123249
> cor(anscombe)[1:4,5:8] y1 y2 y3 y4 x1 0.8164205 0.8162365 0.8162867 0.3140467 x2 0.8164205 0.8162365 0.8162867 0.3140467 x3 0.8164205 0.8162365 0.8162867 0.3140467 x4 0.5290927 0.7184365 0.3446610 0.8165214 > diag(cor(anscombe)[1:4,5:8]) [1] 0.8164205 0.8162365 0.8162867 0.8165214
> cbind(coef(reg1),coef(reg2),coef(reg3),coef(reg4)) [,1] [,2] [,3] [,4] (Intercept) 3.0000909 3.000909 3.0024545 3.0017273 x1 0.5000909 0.500000 0.4997273 0.4999091
> c(summary(reg1)$sigma,summary(reg2)$sigma, + summary(reg3)$sigma,summary(reg4)$sigma) [1] 1.236603 1.237214 1.236311 1.235695
> c(summary(reg1)$r.squared,summary(reg2)$r.squared, + summary(reg3)$r.squared,summary(reg4)$r.squared) [1] 0.6665425 0.6662420 0.6663240 0.6667073
+ c(summary(reg1)$fstatistic[1],summary(reg2)$fstatistic[1], + summary(reg3)$fstatistic[1],summary(reg4)$fstatistic[1]) value value value value 17.98994 17.96565 17.97228 18.00329
> summary(reg2) Call: lm(formula = y2 ~ x2, data = anscombe) Residuals: Min 1Q Median 3Q Max 1.9009 0.7609 0.1291 0.9491 1.2691 Coefficients: Estimate Std. Error t value Pr(>t) (Intercept) 3.001 1.125 2.667 0.02576 * x2 0.500 0.118 4.239 0.00218 **  Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.237 on 9 degrees of freedom Multiple Rsquared: 0.6662, Adjusted Rsquared: 0.6292 Fstatistic: 17.97 on 1 and 9 DF, pvalue: 0.002179
> reg2b=lm(y2~x2+I(x2^2),data=anscombe) > summary(reg2b) Call: lm(formula = y2 ~ x2 + I(x2^2), data = anscombe) Residuals: Min 1Q Median 3Q Max 0.0013287 0.0011888 0.0006294 0.0008741 0.0023776 Coefficients: Estimate Std. Error t value Pr(>t) (Intercept) 5.9957343 0.0043299 1385 <2e16 *** x2 2.7808392 0.0010401 2674 <2e16 *** I(x2^2) 0.1267133 0.0000571 2219 <2e16 ***  Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.001672 on 8 degrees of freedom Multiple Rsquared: 1, Adjusted Rsquared: 1 Fstatistic: 7.378e+06 on 2 and 8 DF, pvalue: < 2.2e16
> summary(reg3) Call: lm(formula = y3 ~ x3, data = anscombe) Residuals: Min 1Q Median 3Q Max 1.1586 0.6146 0.2303 0.1540 3.2411 Coefficients: Estimate Std. Error t value Pr(>t) (Intercept) 3.0025 1.1245 2.670 0.02562 * x3 0.4997 0.1179 4.239 0.00218 **  Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.236 on 9 degrees of freedom Multiple Rsquared: 0.6663, Adjusted Rsquared: 0.6292 Fstatistic: 17.97 on 1 and 9 DF, pvalue: 0.002176
> reg3b=lm(y3~x3,data=anscombe[3,]) > summary(reg3b) Call: lm(formula = y3 ~ x3, data = anscombe[3, ]) Residuals: Min 1Q Median 3Q Max 0.0041558 0.0022240 0.0000649 0.0018182 0.0050649 Coefficients: Estimate Std. Error t value Pr(>t) (Intercept) 4.0056494 0.0029242 1370 <2e16 *** x3 0.3453896 0.0003206 1077 <2e16 ***  Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.003082 on 8 degrees of freedom Multiple Rsquared: 1, Adjusted Rsquared: 1 Fstatistic: 1.161e+06 on 1 and 8 DF, pvalue: < 2.2e16
> summary(reg4) Call: lm(formula = y4 ~ x4, data = anscombe) Residuals: Min 1Q Median 3Q Max 1.751 0.831 0.000 0.809 1.839 Coefficients: Estimate Std. Error t value Pr(>t) (Intercept) 3.0017 1.1239 2.671 0.02559 * x4 0.4999 0.1178 4.243 0.00216 **  Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.236 on 9 degrees of freedom Multiple Rsquared: 0.6667, Adjusted Rsquared: 0.6297 Fstatistic: 18 on 1 and 9 DF, pvalue: 0.002165
Saturday, October 27 2012
Somewhere else, part 18
By arthur charpentier on Saturday, October 27 2012, 23:31  Links
A nice post (found a couple of days ago) on an insteresting topic, I should perhaps discuss more often on this blog
 "Why do people love to say that correlation does not imply causation?" http://www.slate.com/arti…
 "L'Aquila's earthquakescarred streets see battle between science and politics" http://www.guardian.co.uk/w/27/laquilaearthquakebattlesciencepolitics?CMP=twt_fd …
 "L’Aquila: earthquake, verdict, and statistics"http:// xianblog.wordpress.com/20uilaearthquakeverdictandstatistics/ …

"Complexity
and the madness of crowds – lessons from disaster prevention" on
@rszbt's bloghttp:// reszatonline.wordpress.com/20plexityandthemadnessofcrowdslessonsfromdisasterprevention/ … 
"Trial
Over Earthquake in Italy Puts Focus on Probability" http://www.nytimes.com/science/04quake.html?_r=1& … via
@statfr  "Italian court ruling sends chill through science community" http://www.reuters.com/a10/22/ussciencemanslaughteritalyidUSBRE89L1IC20121022?feedType=RSS&feedName=scienceNews&utm_source=dlvr.it&utm_medium=twitter&dlvrit=309301 …
 "Verdict of l’Aquila Earthquake Trial Sends the Wrong Message" about how to deal with hazard assessment and mitigation http://www.wired.com/wir…
 "Scientists on trial: At fault?" http://www.nature.com/110914/pdf/477264a.pdf … (a detailed article)
 "Italian scientists convicted of manslaughter" http://arstechnica.com/s0/italianscientistsconvictedofmanslaughterforearthquakeriskreport/ … "No finding of elevated risk in a report days before a fatal earthquake"
 "The Role of Connections in Academic Promotions” by Natalia Zinovyeva and Manuel Bagues http://ftp.iza.org/dp6821.pdf
 "The Student Debt Crisis" http://www.americanprogress.org/isseducation/report/2012/10/25/42905/thestudentdebtcrisis/…
 "Integrals don’t have anything to do within discrete math, do they?"http:// mathdl.maa.org/im…
 [free ebook] "Think Bayes: Bayesian Statistics Made Simple" by Allen B. Downey http://www.greenteapress.com/th…
 "Benoit Mandelbrot, the father of fractal geometry, pens a disturbing new memoir on mathematics—and survival" http://www.tabletmag.com/jewdculture/books/114766/amathgeniussadcalculus?all=1 …
 "Milliseconds matter" http://www.washingtonpost.com/bumy/millisecondsmatter/2012/10/25/bedf82101ef911e29cd5b55c38388962_graphic.html … "pricing strategies and decisionmaking process involved in HF trading"

"The Virtues and Vices of
Election Prediction Markets" http://fivethirtyeight.blogs.nytimes.com/223thevirtuesandvicesofelectionpredictionmarkets/ … by
@fivethirtyeight via http://krugman.blogs.nytimes.com/2… on nerds  "Communication about science doesn’t need to be timeconsuming or distracting from research ..." http://blogs.nature.com/soapbox/2012/10/24/anelevatorpitchforaresearchproject?WT.mc_id=TWT_NatureBlogs&buffer_share=35adc …

"21
Reasons Why You Should Never Date An Economist" http://inesad.edu.bo/dev… see alsohttp://www. economistsdoitwithmodels.com/20tforfunreasonsnottodateaneconomistthanksguys/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+economistsdoitwithmodels+%28Economists+Do+It+With+Models%29 … via
@adelaigue  "Fun with tax: How taxation by government has changed" http://www.economist.com/bloetail/2012/10/dailychart12 …
 nice application http://www.visualthesaurus.com/app/view for those who like words, and graphs
 "Frankenstein Economics is killing capitalism"http://www. marketwatch.com/Stoint?guid=9E40EEA01C5511E2B624002128040CF6 …
 "Asset Pricing with Garbage" in the Journal of Finance, last year http://onlinelibrary.wiley.com/do.15406261.2010.01629.x/abstract …
 "The Future of Computer Trading in Financial Markets"http://www. bis.gov.uk/fore…
 "Auction Theory: A Guide to the Literature" by Paul Klemperer http://www.nuff.ox.ac.uk/us…
 "Ethics and Finance: The Role of Mathematics" http://magicmathsmoney.blogspot.ca/20andfinanceroleofmathematics.html…

"How
to find a perfect match for a Nobel"http:// timharford.com/howtofindaperfectmatchforanobel/ … by
@timharford via@nereidadin 
"Who
owns research data and the rights to publish it?" http://emckiernan.wordpress.com/whoownsresearchdataandtherightstopublishit/ … via
@mathewi see also an article on databases property regulated by EU La w http://ec.europa.eu/intt/copyright/protdatabases/index_en.htm … 
"We’re
probably at the death of education" http://thenextweb.com/ins… via
@IanikMarcil  "Can humans cause an earthquake?" http://arstechnica.com/scie0/canhumanscauseanearthquake/ … see also http://www.nature.com/ngaop/ncurrent/full/ngeo1610.html …
 "Algorithms, Arbitrage, and Overreaction on Intrade"http:// rajivsethi.blogspot.ca/algorithmsarbitrageandoverreaction.html …

"Did
the financial blogosphere go away?" http://blogs.reuters.com/feli012/10/23/didthefinancialblogospheregoaway/ … by
@felixsalmon via@stanjourdan 
"an XKCDesque chart" with
#rstats http://drunksandlampposts.com/cleggvsplebanxkcdesquechart/ … answer to http://stackoverflow.com/12675147/xkcdstylegraphsinr … 's challenge 
"And
the winner for longest time on record between publication and
retraction is…" http://retractionwatch.wordpress.com/andthewinnerforlongesttimeonrecordbetweenpublicationandretractionis/ … via
@NGhoussoub 
"Location,
Location, Location" by Alexandra M. Lord http://chronicle.com/LocationLocationLocation/134264/ … via
@tomroud 
"How
ProPublica’s Message Machine Reverse Engineers Political Microtargeting" http://www.propublica.org/ne… via
@propublica and http://www.propublica.org/aremachinestartsprovidinganswers … 
If
you're working in Academia “it’s your duty to be miserable” http://chronicle.com/ItsYourDutytoBe/135014/ … by
@Pannapacker via@NGhoussoub@m_m_campbell  Forthcoming actuarial projects on life tables "Average life span for Dwarves, Hobbits and Men" http://lotrproject.com/stfeexpectancy …

"Scientific
research bodies 'failing to engage public'"http://www. scidev.net/fr/scmunication/news/lesorganismesderecherchescientifiquetroppeuengagsaveclepublic.html … via
@collectifpapera  "Is it meaningful to talk about a probability of “65.7%” that Obama will win the election?" http://andrewgelman.com/isitmeaningfultotalkaboutaprobabilityof657thatobamawillwintheelection/ …

"Why
It is Essential That Criminal Bankers are Prosecuted"http://www. nakedcapitalism.com/whyitisessentialthatcriminalbankersareprosecuted.html … via
@michellaurence  "Perhaps the whole ‘don’t put all your eggs in one basket’ school of portfolio allocation is financial wisdom enough" http://timharford.com/whensimplicityisarealasset/ …
 damned, even the New York Times knows: "You Don’t Work as Hard as You Say You Do" http://economix.blogs.nytimes.com/youdontworkashardasyousayyoudo/ …
 [video doc] "Money as Debt" directed by Paul Grignon http://youtu.be/e6LWqgohO4E http://youtu.be/lsmbWBpnCNk http:// and youtu.be/f6uuAupT4AQ via http://www.economicreason.com/d..se/top15economicdocumentaries/
 [video doc] "The Ascent of Money: A Financial History of The World" by Niall Ferguson http://youtu.be/4Xx_5PuLIzc via http://www.economicnoise.com/thetop18economicdocumentaries/ …

[video doc]
"97% Owned" http://youtu.be/XcGh1Dex4Yo via http://www.economicnoise.com/thetop18economicdocumentaries/ … and
@clasicaliberal 
[video doc]
"Overdose: The Next Financial Crisis"http:// youtu.be/4ECi6WJpbzE via http://www.economicnoise.com/thetop18economicdocumentaries/ …and
@clasicaliberal

Le
Krach de 1929, discuté 15 jours plus tard dans le Journal des Finances http://gallica.bnf.fr/bpt6k5552712t.image … via
@GallicaBnF

"Économistes
à gages et médias complaisants" par Renaud Lambert http://www.acrimed.org/html … (suite du livre de
@LaurentMauduit) 
"Ce
que le blog apporte à la recherche" par Antoine Blanchard (a.k.a.
@Enroweb) http://press.openedition.org/172
Tuesday, October 23 2012
Predictions and errors
By arthur charpentier on Tuesday, October 23 2012, 13:09  risks
There have been a lot of interesting papers about the "manslaughter trial" of six seismologists and a government official in Italy, where justice pointed out that there was a failure to warn the population before the deadly earthquake in 2009, see e.g. "Trial Over Earthquake in Italy Puts Focus on Probability and Panic" on nytimes.com, "Italian scientists convicted of manslaughter for earthquake risk report" on arstechnica.com, "Italian court ruling sends chill through science community" on reuters.com, "Scientists on trial: At fault?" on nature.com or (probably the most interesting one) "The Verdict of the l’Aquila Earthquake Trial Sends the Wrong Message" on wired.com.
First of all, I started less than 15 months ago to work on earthquake series and models, and I am still working on the second paper, but so far, what I've seen is that those series are very noisy. When working on a large scale (say 500km), it is still very difficult to estimate the probability that there will be a large earthquake on a large period of time (from one year to a decade). Even including covariate such as foreshocks. So I can imagine that it is almost impossible to predict something accurate on a smaller scale, and on a short time frame. A second point is that I did not have time to look carefully at what was said during the trial: I just have been through what can be find in articles mentioned above.
But as a statistician, I really believe, as claimed by Niels Bohr (among many others) that "prediction is very difficult, especially about the future". Especially with a 0/1 model (warning versus not warning). In that case, you have the usual type I and type II errors (see e.g. wikipedia.org for more details),
 type I is "false positive error" when you issue a warning, for nothing. A "false alarm" error. With standard "test" words, it is like when a pregnancy tests predict that someone is pregnant, but she's not.
 type II is "false negative error" failing to assert something, what is present.Here, it is like when a pregnancy tests predict that someone is not pregnant, while she actually is.
The main problem is that statisticians wish to design a test with both errors as small as possible. But usually, you can't. You have to make a tradeoff. The more you want to protect yourself against Type I errors (by choosing a low significance level), the greater the chance of a Type II error. This is actually the most important message in all Statistics 101 courses.
Another illustration can be from the course I am currently teaching this semester, precisely on prediction and forecasting techniques. Consider e.g. the following series
Here, we wish to make a forecast on this time series (involving a confidence interval, or region). Something like
The aim of the course is to be able to build up that kind of graph, to analyze it, and to know exactly what were the assumptions used to derive those confidence bands. But if you might go to jail for missing something, you can still make the following forecast
From this trial, we know that researchers can go to jail for making a type II error. So, if you do not want to go to jail, make frequent type I error (from this necessary tradeoff). Because so far, you're more likely not to go to jail for that kind of error (the boy who cried wolf kind). Then, you're a shyster, a charlatan, but you shouldn't spend six years in jail ! As mentioned on Twitter, that might be a reason why economist keep announcing crisis ! That might actually be a coherent strategy...
Saturday, October 20 2012
Somewhere else, part 17
By arthur charpentier on Saturday, October 20 2012, 13:32  Links
This week, the most important news is undoubtly

"Randomly generated mathematics
research papers!" http://thatsmathematics.com/b102 … "Mathgen paper accepted",
see also "Nonsense paper accepted by mathematics journal" http://marginalrevolution.com/… via
@adelaigue

"Publication
incentives" http://www.quantumforest.com/publicationincentives/ … by
@zentree (following http://www.quantumforest.com/academicpublicationboycott/ … )
 "Challenging the integrity of research" http:// blogs.nature.com/2012/10/17/challengingtheintegrityofscientist?WT.ec_id=NATUREjobs20121018…
 "We need a method of assessing the support of research if we want to change the ‘publish or perish’ culture" http://blogs.lse.ac.uk/impaciences/2012/10/17/voytekfishycitationsbrainscanr/ …
 [notes] "R for SAS and SPSS users"https:// science.nature.nps.gov/im/datamgmt/statistics/R/documents/R_for_SAS_SPSS_users.pdf …
 "status and math in economics" http://orgtheory.wordpress.com/statusandmathineconomics/ … see also the paper http://mpra.ub.unimuenchen.de/MPRA_paper_41363.pdf …
 "What is math, and why should we use it in economics?" http://noahpinionblog.blogspot.ca/whatismathandwhyshouldweuseit.html …
 [video course] "Beyond Computation: The P vs NP Problem" by Michael Sipser http://youtu.be/msp2y...

[free
ebook] "Bayesian Reasoning and Machine Learning" by David Barber http://web4.cs.ucl.ac.uk/staff//textbook/090310.pdf … via
@DataJunkie  "What JP Morgan’s release of VaR has in common with sex and computer viruses" http://www.nickdunbar.net/whatjpmorgansreleaseofvarhasincommonwithsexandcomputerviruses/ …

"forecasting
the Presidential election using regression, simulation, or dynamic
programming" http://punkrockor.wordpress.com/orecastingthepresidentialelectionusingregressionsimulationordynamicprogramming/ …via
@lamclay  "Where Will The Next Pandemic Come From? And How Can We Stop It?" http://www.popsci.com/science/se/201208/outwild?singlepageview=true…

"How
close are pairwise and mutual independence?"http:// legacy.lclark.edu/pairwise2.pdf … via
@statfr  "Homogeneous record of Atlantic hurricane surge threat since 1923" http://www.pnas.org/content/early/2012/10/10/1209542109.full.pdf+html … via http://www.newscientist.com/article/dn22382tidalrecordsexposesurgeinhurricanes.html …
 "Blogging" http://www.guardian.co.uk/highereducationnetwork/blog/2012/oct/15/blogactiondaypowerofwe … "Too often dismissed as narcissistic echochambers, blogs are the ultimate form of collegiality"
 "All you scientists frustrated by the rejection of your papers from journals, relax. Rejection is rare" http://news.sciencemag.org/scienceinsider/2012/10/scientistsmayfeelrejectedbut.html?ref=em …

"Beware,
win or lose: Domestic violence and the World Cup" http://onlinelibrary.wiley.com/13.2012.00606.x/abstract … in
@signmagazine 
"what
is the optimal way to find a parking spot?" http:// punkrockor.wordpress.com/whatistheoptimalwaytofindaparkingspot/ … by
@lamclay
 "I never, never in my life took a course in economics" (Lloyd Shapley) http://larspsyll.wordpress.com/ineverneverinmylifetookacourseineconomics/ …
 "A Nobel for work that affects your daily life" http://www.cbsnews.com/830257532993/anobelforworkthataffectsyourdailylife/…

"To
win the Nobel Prize in Economics, it helps to wield math. Lots of it"
(posted before the annoncement) http://qz.com/towinthenobelprizeineconomicsithelpstowieldmathlotsofit/ … via
@karlfisch

"Alvin
Roth et Lloyd Shapley, Nobels d'Economie 2012" joli papier d'
@adelaigue sur http://blog.francetvinfo.fr/classeeco/2/10/15/alvinrothetlloydshapleynobelsdeconomie2012.html… 
"Le
Prix Nobel d’économie 2012 : ni prix, ni Nobel, ni économie ?"http://legizmoblog.blogspot.fr/leprixnobeldeconomie2012niprixni.html … par
@blogizmo
 "Le douloureux calcul de la valeur de la vie" http://www.slate.fr/story/63485/qalycalculvaleurvie … quand un problème classique pour les actuaires intéresse maintenant les économistes cf http://www.nber.org/11405..., http://ideas.repec.org/v21y2002i2p253270.html … pour une méta analysis ou http://www.ncbi.nlm.nih.gov/PMC1650128/ …
 quand
un brillant statisticien (Paul Deheuvels) prend la parole sur l'étude
de Séralini, http://leplus.nouvelobs.com/c61194letudedeseralinisurlesogmpommedediscordealacademiedessciences.html … via
@GolumModerne@virg_garin  "Ne comptez pas trop sur les JT et les chaînes d'info pour délivrer un discours critique sur l'économie" sur http://television.telerama.fr/po...rquoimateleestelleliberale,87925.php#xtor=RSS18
 "Ces économistes qui monopolisent (toujours) les débats"http://www.acrimed.org/article3904ml …

«Chocolatine
ou Pain au chocolat ?» (en France) http://blog.adrienvh.fr/cartographiedesresultatsdechocolatineoupainauchocolat/ … via
@AdrienneAlix@le_luk@romainmenard@blogizmo@tomroud

"Universités
du Québec: le spectre du sousfinancement, ou quand la quantité
remplace la qualité http://www.irisrecherche.qc.ca/blotredusousfinancement …"
RT
@Mlle_Titam 
"idiots" http://dirtydenys.net/in/2012/10/10/idi… par Denys Bergrave via
@Vicnent@pegobry@bzavier@adelaigue  (en moyenne) "Twitter est une jeune femme américaine" http://www.ledevoir.com/soc/361401/twitterestunejeunefemmeamericaine …
« previous entries  page 1 of 10