Pour la fin de cours , quelques transparents sur l'identification et l'estimation des modèles SARIMA, quelques compléments sur les tests (racines unités et non-stationnarité, ainsi que saisonnalité), et enfin, quelques pistes pour construire des prédictions (avec une quantification de l'incertitude). Les transparents sont en ligne ici. Maintenant que les transparents sont finis (et en ligne) les prochains billets seront orientés autour de la modélisation et des aspects computationnels.
Friday, November 30 2012
Estimation et prévision pour des séries temporelles
By arthur charpentier on Friday, November 30 2012, 13:40 - ACT6420-A2012
Thursday, November 29 2012
Save R objects, and other stuff
By arthur charpentier on Thursday, November 29 2012, 09:08 - ACT6420-A2012
Yesterday, Christopher asked me how to store an R object, to save some time, when working on the project.
First, download the csv file for searches related to some keyword, via http://www.google.com/trends/, for instance "sunglasses". Recall that csv files store tabular data (numbers and text) in plain-text form, with comma-separated values (where csv term comes from). Even if it is not a single and well-defined format, it is a text file, not an excel file!
Recherche sur le Web : intérêt pour sunglasses
Dans tous les pays; De 2004 à ce jour
Intérêt dans le temps
Semaine,sunglasses
2004-01-04 - 2004-01-10,48
2004-01-11 - 2004-01-17,47
2004-01-18 - 2004-01-24,51
2004-01-25 - 2004-01-31,50
2004-02-01 - 2004-02-07,52
(etc) The file can be downloaded from the blog,
> report=read.table( + "http://freakonometrics.blog.free.fr/public/data/glasses.csv", + skip=4,header=TRUE,sep=",",nrows=464)
Then, we just run a function to transform that data frame. It can be found from a source file
> source("http://freakonometrics.blog.free.fr/public/code/H2M.R")
In this source file, there is function that transforms a weekly series into a monthly one. The output is either a time series, or a numeric vector,
> sunglasses=H2M(report,lang="FR",type="ts")
Here, we asked for a time series,
> sunglasses Jan Feb Mar Apr 2004 49.00000 54.27586 66.38710 80.10000 2005 48.45161 58.25000 69.93548 80.06667 2006 49.70968 57.21429 67.41935 82.10000 2007 47.32258 55.92857 70.87097 84.36667 2008 47.19355 54.20690 64.03226 79.36667 2009 45.16129 50.75000 63.58065 76.90000 2010 32.67742 44.35714 58.19355 70.00000 2011 44.38710 49.75000 59.16129 71.60000 2012 43.64516 48.75862 64.06452 70.13333 May Jun Jul Aug 2004 83.77419 89.10000 84.67742 73.51613 2005 83.06452 91.36667 89.16129 76.32258 2006 86.00000 92.90000 93.00000 72.29032 2007 86.83871 88.63333 84.61290 72.93548 2008 80.70968 80.30000 78.29032 64.58065 2009 77.93548 70.40000 62.22581 51.58065 2010 71.06452 73.66667 76.90323 61.77419 2011 74.00000 79.66667 79.12903 66.29032 2012 79.74194 82.90000 79.96774 69.80645 Sep Oct Nov Dec 2004 56.20000 46.25806 44.63333 53.96774 2005 56.53333 47.54839 47.60000 54.38710 2006 51.23333 46.70968 45.23333 54.22581 2007 56.33333 46.38710 44.40000 51.12903 2008 51.50000 44.61290 40.93333 47.74194 2009 37.90000 30.38710 28.43333 31.67742 2010 50.16667 46.54839 42.36667 45.90323 2011 52.23333 45.32258 42.60000 47.35484 2012 54.03333 46.09677 43.45833
that we can plot using
> plot(sunglasses)

Now we would like to store this time series. This is done easily using
> save(sunglasses,file="sunglasses.RData")
Next time we open R, we just have to use
> load("/Users/UQAM/sunglasses.Rdata")
to load the time series in R memory. So saving objects is not difficult.
Last be not least, for the part on seasonal models, we will be using some functions from an old package. Unfortunately, on the CRAN website, we see that

but nicely, files can still be found on some archive page. On Linux, one can easily install the package using (in R)
> install.packages( + "/Users/UQAM/uroot_1.4-1.tar.gz", + type="source")
With a Mac, it is slightly more complicated (see e.g. Jon's blog): one has to open a Terminal and to type
R CMD INSTALL /Users/UQAM/uroot_1.4-1.tar.gz

On Windows, it is possible to install a package from a zipped file: one has to download the file from archive page, and then to spot it from R.
The package is now installed, we just have to load it to play with it, and use functions it contains to tests for cycles and seasonal behavior,
> library(uroot) > CH.test function (wts, frec = NULL, f0 = 1, DetTr = FALSE, ltrunc = NULL) { s <- frequency(wts) t0 <- start(wts) N <- length(wts) if (class(frec) == "NULL") frec <- rep(1, s/2) if (class(ltrunc) == "NULL") ltrunc <- round(s * (N/100)^0.25) R1 <- SeasDummy(wts, "trg") VFEalg <- SeasDummy(wts, "alg")
(etc)
Actuariat 1, ACT2121, septième cours
By arthur charpentier on Thursday, November 29 2012, 07:51 - ACT2121-A2012

Toujours dans le cadre de la préparation à l'examen P de la SOA, une série d'exercices. Comme il reste trois semaines (en plus de l'examen final), on va essayer de finir de revoir l'ensemble des notions. Les 50 exercices sont en ligne ici. L'énoncé et la correction du devoir de lundi dernier seront mis en ligne très bientôt...
Wednesday, November 28 2012
D.I.Y. strategy, and why academics should blog!
By arthur charpentier on Wednesday, November 28 2012, 17:54 - personal
Last week, I went to the Econometrics seminar of Montréal, at UdM, where Alfred Galichon was giving a great talk on marriage market. Alfred is a former colleague (from France), a co-author, an amazing researcher, and above all, a friend of mine. And he has always be supportive about my blogging activities. So while we were having lunch, after the seminar, Alfred mentioned my blogging activity to the other researchers. I should say researchers in Econometrics (yes, with a capital E, since it is a Science, as mention in an old paper by David Hendry by the end of the 70's). Usually, when I am involved in this kind of meeting, I start with some apologies, explaining that I do like theoretical econometrics (if not, I would not come to the seminar), but I do like my freakonometrics activity. I do like to use econometrics (or statistical techniques) to figure out (at least to try) why some things works the way they do. I try to find data, and then try to briefly analyze them to answer some simple questions. Or sometime, I just run simulations to answer more theoretical questions (or at least to give clues).
But above all, I like the fact that blogging gives me the opportunity to interact with people I would never meet without this activity. For instance, last May, I was discussing (on Twitter) with @coulmont, @joelgombin and @imparibus about elections in France. Then @coulmont asked me "yes, everyone knows that there should be some ecological fallacies behind my interpretation, but I am not so sure since I have data with a small granularity. As an econometrician, what do you think ?" Usually, I hate having a label, like "... I ask you since you're a mathematician", or "as an economist, what do you think of...". Usually, when people ask me economic questions, I just claim being a mathematician, and vice-versa. But here, I even put on the front of my blog the word "econometrics" (more or less). So here, I could not escape... And the truth is, that while I was a student, I never heard anything about this "ecological fallacy". Neither did I as a researcher (even if I have been reading hundreds of econometric articles, theoretical and applied). Neither did I as a professor (even if I have been teaching econometrics for almost ten years, and I have read dozens textbooks to write notes and handouts). How comes ? How come researchers in sociology and in political sciences know things in econometrics that I have never heard about ?
The main reason - from my understanding - is the following: if everyone talks about "interdisciplinarity" no one (perhaps a few) is really willing to pay the price of working on different (not to say many) areas. I tried, and trust me, I found it difficult. It is difficult to publish a paper in a climate journal when you're not specialist in climate (and you just want to give your opinion as a statistician). It is difficult to assume that you might waste weeks (not to say months) reading articles in geophysics if you want to know more about earthquakes risks, going to seminars, etc. Research is clearly a club ("club" as defined in Buchanan (1965)) story.
This week, I planned to go to some journal club in biology and physics, at McGill (kindly, a colleague there invited me, but we got a time misunderstanding)... this has nothing to do with my teaching, nor with my research activities. But I might learn something ! Yes, I do claim that I am paid just to have fun, to read stuff that I do find interesting, trying to understand the details of a proof, trying to understand how data were obtained. In most cases, it might (and should) be a complete waste of time, since I will not publish anything (anything serious, published in some peer reviewed journal) on that topic... but should I really care ? As I explained earlier (in French), I do also claim that I have a moral obligation to return everything I have seen, heard, read. And since I am not a big fan of lectures (and that I do not think I have skills for that) I cannot give my opinion, neither on economics facts (as @adelaigue or @obouba might do on their blogs) or on science results (as @tomroud does). But I think I can help on modeling and computational issues. My point being: never trust what you read (even on my blog) but please, try to do it yourself! You read that "90% of French executive think about expatriation" (as mentioned here)? Then try to find some data that should confront that statement. And see if you come up with the same conclusion... And since it might be a bit technical sometimes, here are some lines of code, to do it on your own... Academics have a legitimacy when they give their opinions on technical issues. At least they can provide with a list of reference everyone should read to get an overview of the topic. But academics can also help people read graphs, or data. To give them "numeracy" (or a culture in numbers) necessary to understand complex issues.
To conclude, I should mention that I understood what this "ecological fallacy" was from Thomsen (1987) and many more documents could be found on Soren Thomsen's page http://www.mit.ps.au.dk/srt/. But I got most of the information I was looking for from a great statistician, who happens to be also an amazing blogger: Andrew Gelman (see http://andrewgelman.com/). I will probably write a post someday about this, since I found the question extremely interesting, and important.
Sunday, November 25 2012
Somewhere else, part 22
By arthur charpentier on Sunday, November 25 2012, 20:55 - Links
Some interesting posts, or articles, found this week on the internet,
- "Academia: Off the tenured track" http://www.nature.com/natence/articles/10.1038/nj7425-627a?WT.mc_id=FBK_NatureJobs …
- "If you deliver a edgy experience for students you quite often push them outside their comfort zone" http://www.guardian.co.uk/higher-edu…
- "An empirical study on the determinants of international student mobility: a global perspective" http://link.springer.com/07/s10734-012-9593-5 …
- Nicolas Gueguen, on "higher volumes led beer drinkers in a bar to imbibe more" http://www.icd.go.cr/sitio/… see also http://www.nytimes.com/nyregion/in-new-york-city-indoor-noise-goes-unabated.html?_r=1&hp&pagewanted=all …
- Twitter, and "the notion of self-promotion"http://micro2tele.com/the-notion-of-self-promotion/ …
- thoughts on heat maps, via http://xkcd.com/1138/

- "Remember Movember: the science behind the moustache" http://www.guardian.co.uk/brain-flapping/2012/nov/20/movember-science-moustache …
- "How noisy is economics/finance peer review?" http://jamesjchoi.blogspot.ca/howisy-is-economicsfinance-peer.html …
-
"Economists
as dentists" (from Keynes) http://stumblingandmumbling.typepad.com/stumbling… via
@adelaigue - "How to Live Without Irony" http://opinionator.blogs.nytimes.com/how-to-live-without-irony/ …
- "Ask A Banker: Are The Banks Still Too Big To Fail?" http://www.npr.org/blogs/money/12/11/20/165585171/ask-a-banker-are-the-banks-still-too-big-to-fail?ft=1&f=93559255 …
- geography of college foot fans http://thequad.blogs.nytimes.com/he-geography-of-college-football-fans-and-realignment-chaos/ …
- "Behind the Green Square: Why Many Students Opposed the Strike" (Quebec's spring) http://www.academicmatters.ca/behind...-the-green-square-why-many-students-opposed-the-strike/
- "Scientists Display Tenacious Teleological Tendencies: Purpose-Based Reasoning as a Cognitive Default"http:// dx.doi.org/30399 …
- "Will 2015 be the Beginning of the End for SAS and SPSS?"http:// r4stats.com/beginning-of-the-end/ …
- "Deterrence versus Brutalization: Capital Punishment's Differing Impacts among States" http://www.michiganlawreview.org/4/2/Shepherd.pdf …econometrics and law, nice !
- "Measuring the Complexity of the Dates of Holidays" http://www.wired.com/wiredscience012/11/measuring-the-complexity-of-the-dates-of-holidays/ … on "Kolmogorov complexity of dates of various national holidays"
-
"Thoughts
on Teaching the Coase Theorem"http:// worthwhile.typepad.com/worthwhile_canadian_initi/2012/11/what-is-so-hard-to-understand-about-the-coase-theorem.html … by
@franceswoolley -
"Trying
to Keep Your E-Mails Secret When the C.I.A. Chief Couldn’t" http://www.nytimes.com/technology/trying-to-keep-your-e-mails-secret-when-the-cia-chief-couldnt.html?_r=0 … via
@latrive - "A map that helped reshape the world" http://juanvelascoblog.com/a-map-that-helped-reshape-the-world/ … by the Art Director of National Geographic
- "Charting weather disasters" http://juanvelascoblog.com/charting-weather-disasters/ …
- "Rains that are almost biblical, heat waves that don’t end, tornadoes that strike in savage swarms"http:// ngm.nationalgeographic.com/extreme-weather/miller-text …
- "‘Shadow Banking’ [http://en.wikipedia.org/Shnking_system …] Up to $67 Trillion, Financial Group Says" http://www.nytimes.com/usiness/shadow-banking-up-to-67-trillion-financial-group-says.html?partner=rss&emc=rss&_r=0 …
- "unusual drive for academics to work with economists and banks in the aftermath of the financial and economic crisis"http://www. universityworldnews.com/articl…
- False-Positive "Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant"http:// dx.doi.org/56797611417632 …
-
"Does
“Statistical Significance” Imply “Actually Signifiant”?" http://popsych.org/does-statistical-significance-imply-actually-signifiant/ … via
@introspection -
"Thinking
like a statistician (continuously) rather than like a civilian
(discretely)" http://andrewgelman.com/thinking-like-a-statistician-continuously-rather-than-like-a-civilian-discretely/ … on
@StatModeling's blog - "Gender and the effects on car insurance in the U.K."http://www.moneysupermarket.com/car-insurance/monitor-gender/ …
- "Columbia Data Science Class" http://columbiadatascience.com/blog/ by Rachel Schut
-
Marwell
and Ames's "Economists Free Ride, Does Anyone Else?"http://econ.ucsd.edu/~jandreon/E264/papers/Marwell%20Ames%20JPubE%201981.pdf … revised by
@franceswoolley worthwhile.typepad.com/worthwhile_canadian_initi/2012/11/economics-and-cooperation-revisited.html … -
Frank,
Gilovich and Regan's "Does Studying Economics Inhibit Cooperation?" http://docenti.unimc.it/docenti/luca-de-benedictis/laboratorio-di-economia-internazionale-applicata-2006-2/Frank-Gilovich-Regan93.pdf … (still via
@franceswoolley) - "Solving the Too Big to Fail Problem" http://www.newyorkfed.org/newseches/2012/dud121115.html … (William C. Dudley's speech)
- "productivity of teaching, measured in kb transmitted from teacher to student per unit of time, hasn’t increased much" http://www.cato-unbound.org/alex-tabarrok/why-online-education-works/ …
- "'Black swans' and 'perfect storms' become lame excuses for bad risk management" http://news.stanford.edu/nember/black-swan-risk-111612.html …
-
"Analysis
of the statistics blogosphere" http://realizationsinbiostatistics.blogspot.ca/2012/11/analysis-of-statistics-blogosphere.htmlalizationsinbiostatistics.blogspot.ca/analys… … by
@randomjohn -
"Moments
determine the tail of a distribution (but not much else)” by Lindsay
& Basak http://www.personal.psu.edu/user… via
@JohnDCook
- "dans notre monde rempli d'incertitudes, on ressent de plus en plus le besoin d'évaluer, de hiérarchiser, de classer"http://www.etudiants.ch/059/les-universites-prestigieuses …
- les français sont-ils nuls en économie, ou juste les journalistes français ? http://lespoir.jimdo.com/pourquoi-les-journalistes-sont-ils-nuls-en-%C3%A9conomie/ …
- "La corruption des économistes de service n'est pas formelle"http:// blog.mondediplo.net/Economistes-institutions-pouvoirs … cf aussi http://www.monde-diplomatique.fr/carnet/-21-Economistes …
- "je m’excuse" serait grammaticalement correcthttp:// fr.wiktionary.org/wiki/je_m0%99excuse …
My hobby
By arthur charpentier on Sunday, November 25 2012, 20:46 - ACT6420-A2012

(via http:/xkcd.com/605/ ) )))
Wednesday, November 21 2012
Introduction aux modèles SARIMA
By arthur charpentier on Wednesday, November 21 2012, 16:06 - ACT6420-A2012
Quelques transparents en plus, qui devraient correspondre aux deux prochains cours de séries temporelles, sur les processus autorégressifs (AR) et moyennes mobiles (MA), les ARMA, les ARIMA (intégrés) et les SARIMA (saisonniers). J'ai mis des notes sur les tests de racine unités, je rajouterais quelques transparents la semaine prochaine sur les tests de saisonnalité, et quelques exemples pratiques de prévision. Les transparents sont en ligne ici,
Actuariat en Afrique subsaharienne francophone
By arthur charpentier on Wednesday, November 21 2012, 14:47 - insurance
Depuis que le blog (ou le précédent) existe, je sais (par les messages
que je reçois par courriel) qu’il est beaucoup lu en Afrique
(francophone). J’ai reçu avant hier un livre publié par Aymric Kamega et
Frédéric Planchet, tiré de la thèse de doctorat d’Aymric (en ligne sur http://halshs.archives-ouvertes.fr/)
défendue en décembre dernier. J’étais alors rapporteur extérieur, et
j’avais souligné l’intérêt des travaux d’Aymric dans le cadre de la
volonté de la CIMA (autorité de contrôle régionale des marchés
d’assurance pour l’Afrique subsaharienne francophone) de fournir aux
assureurs de la région des outils adaptés. En particulier des tables de
mortalité d’expérience, propres à la région.
En effet, (comme le
rappelle Aymric), suite aux états généraux (de l’assurance vie) en 2007, le principe de la construction de tables de mortalité
d’expérience a été adopté en remplacement des tables de mortalité de la
population générale française entre 1960 et 1964 (tables dites PM 60-64
et PF 60-64) jusqu’alors imposées. Aymric avait travaillé sur ce sujet,
et a soutenu une thèse de doctorat qui avait fait l'unanimité. J’avais pris beaucoup
de plaisir à lire la thèse, j’en pense que j’en aurais à lire le livre
(disponible sur le site de l’éditeur http://seddita.com/).
Tuesday, November 20 2012
Blog transfert
By arthur charpentier on Tuesday, November 20 2012, 22:25 - Personal

One more time, the blog should move... it will soon be hosted on the plateform for academic blogs http://hypotheses.org/ (in humanities and social sciences). The new blog adress will be
We are still working on the migration (since this blog is too large to standard transfert of posts, coments, etc). At least, I should get some IT support, instead of being on my own. I hope that the transfert will be completed by the end of this year. See you there !

Transparents, introduction aux séries temporelles
By arthur charpentier on Tuesday, November 20 2012, 13:40 - ACT6420-A2012
Nous allons commencer demain la modélisation des séries temporelles. Les transparents de la séance sont en ligne ici. Je rappelle que les notes de cours (complètes) sont également en ligne, là. Je continuerai à poster régulièrement des billets contenant des commandes R.
Au programme cette semaine, l'utilisation des méthodes de régression pour extraire tendance et cycle, le lissage exponentielle (simple, double et saisonnier), et une présentation des notions importantes dans le cours (stationnarité, autocorrélations, bruit blanc, etc).
Sunday, November 18 2012
Actuariat 1, ACT2121, sixième cours
By arthur charpentier on Sunday, November 18 2012, 23:52 - ACT2121-A2012
Avant le second intra, une dernière série d'exercice, pour préparer l'examen P, en ligne ici
In less than 50 days: 2013, year of statistics
By arthur charpentier on Sunday, November 18 2012, 19:48 - Statistics
A couple of days ago, Jeffrey sent me an e-mail, encouraging me to write about the international year of statistics, hoping that I would participate to the event. The goals of Statistics2013 are (as far as I understood) to increase public awareness of the power and impact of statistics on all aspects of society, nuture Statistics as a profession and promote creativity and development in Statistics.
So I will try to write more posts (yes, I surely can try) on statistical methodology, with sexy applications... So see you in a few days on this blog, to celebrate statistics ! I can even invite guests to write here... all contributors are welcome !

Saturday, November 17 2012
Somewhere else, part 21
By arthur charpentier on Saturday, November 17 2012, 22:59 - Links
A very interesting article, in Scientific American,
- "Children of scientists may inherit genes that not only confer intellectual talents but also predispose them to autism" http://www.scientificamerican.com/a=are-geeky-couples-more-likely-to-have-kids-with-autism …
- “Communication
is a central task of statistics”...http://andrewgelman.com/communication-is-a-central-task-of-statistics-and-ideally-a-state-of-the-art-data-analysis-can-have-state-of-the-art-displays-to-match/ … on
@StatModeling's (reference) blog
- Some Notes on Schelling's Essay (1972, http://jasss.soc.surrey.ac.uk/Schelling_Th_C_1972._On_Letting_a_Computer_Help_with_the_Work._JASSS.pdf …) "On Letting a Computer Help with the Work" http://jasss.soc.surrey.ac.uk
-
"Why
are observations of inflation so biased? And biased by gender?" http://marginalrevolution.com/tion/2012/11/why-are-observations-of-inflation-so-biased-and-biased-by-gender.html … via
@adelaigue see http://www.clevelandfed.org/resntary/2001/1101.pdf… -
"How
To Predict The Future" by
@hertling http://www.feld.com/wp/archives/2012/06/how-to-predict-the-future.html … -
"Women
as Academic Authors, 1665-2010" http://chronicle.com/art-as-Academic-Authors/135192/ …
- "Be persuasive. Be brave. Be arrested (if necessary)"http://www. nature.com/news/be… "More scientists must speak out...."
- "Hurricane Sandy’s huge size: freak of nature or climate change?"http://classic.wunderground.com/JeffMasters/comment.html?entrynum=2293 …
-
"The
Moral of Sandy" by
@BjornLomborg http://www.project-syndicate.org/the-right-and-wrong-way-to-reduce-hurricane-damage-by-bj-rn-lomborg … - "How to give feedback" http://timharford.com/how-to-give-feedback/?utm_source=dlvr.it&utm_medium=twitter …
- "What is Twitter, a Social Network or a News Media?" http://product.ubion.co.kr/up142222731/ccres00056/db/_2250_1/embedded/2010-www-twitter.pdf …
- "Trends in Social Media : Persistence and Decay" http://arxiv.org/1402
- Stan Nikolov’s master’s thesis on "detecting twitter trends before they happen" http://web.mit.edu/snik…
- "Popularity versus similarity in growing networks" http://arxiv.org/0286
-
"Twitter
Event Networks and the Superstar Model"http:// arxiv.org/3090 via
@renaudjf - "How Long Will a Lie Last? Study Finds That False Memories Linger for Years" http://blogs.scientificamerican.com/g…
- "Science is enforced humility" http://www.guardian.co.uk/012/nov/13/science-enforced-humility …"science compels its practitioners to confront their own fallibility"
-
"The
Uncertain Future for Universities"http://conversableeconomist.blogspot.com/the-uncertain-future-for-universities.html … via
@adelaigue - "No more magical thinking" by David Remnick http://www.newyorker.com/…
- "Are All Units Created Equal? The Effect of Default Units on Product Evaluations" http://users.ugent.be/~mapandel/…
- "Child’s Education, but Parents’ Crushing Loans"http://www.nytimes.com/usiness/some-parents-shouldering-student-loans-fall-on-tough-times.html?hpw…
-
"Should
Scientists and Engineers Resist Taking Military Money?" by
@Horganism on http://blogs.scientificamerican.com/cro12/11/12/should-scientists-and-engineers-resist-taking-military-money/ … via@BoraZ - "On the science-based communication of risks following the recent sentencing of Italian scientists" http://www.alphagalileo.org/Asset…
- "Data, Dimensions and Geometry oh my !"http:// geomblog.blogspot.ca/data-dimensions-and-geometry-oh-my.html …
- "Breaking down the Presidential vote"http:// blogs.suntimes.com/pol11/graphics_breaking_down_the_presidential_vote.html …

-
"512
Paths to the White House" http://source.mozillaopennews.org/en/nyts-512-paths-white-house/ … "How we made the
interactive D3 decision tree" via
@alignedleft - "Inside the Secret World of the Data Crunchers Who Helped Obama" http://swampland.time.com/nside-the-secret-world-of-quants-and-data-crunchers-who-helped-obama-win/#ixzz2ByVOqVv1 …
- "is nate silver’s win sociology’s loss?"http:// scatter.wordpress.com/is-nate-silvers-win-sociologys-loss/ … & "He does it again! Will pundits finally accept defeat?" http://simplystatistics.org/81/nate-silver-does-it-again-will-pundits-finally-accept …
Le mois des (boites à) moustaches
By arthur charpentier on Saturday, November 17 2012, 20:15 - ACT6420-A2012
On célèbre cette année les 35 ans d'un livre qui a révolutionné la
statistique, tant sur les aspects computationnels que graphiques: Exploratory Data Analysis par John Tukey. La légende prétend (on pourra relire a brief history of S par Richard Becker) que ce livre a inspiré ce qui allait devenir R, "Commercial software did not fit well into our research environment. It often used a ‘‘shotgun’’ approach — print out everything that might be relevant to the problem at hand because it could be several hours before the statistician could get another set of output. This was reasonable when computing was done in a batch mode. However, we wanted to be able to interact with our data, using Exploratory Data Analysis (Tukey, 1971) techniques. In addition, commercial statistical software usually didn’t compute what we wanted and was not set up to be modified."
|
|
|


> set.seed(1) > X=rlnorm(99)

(on est ici en échelle logarithmique). Pour commencer, on peut faire une boite à moustache avec une visualisation du minimum (à gauche) et du maximum (à droite)
> boxplot(X,horizontal=TRUE,log="x",range="0")

Et effectivement, on retrouve facilement tous les paramètres: ceux de la boite (en rouge)
> abline(v=median(X),lty=2,col="red") > abline(v=quantile(X,.25),lty=2,col="red") > abline(v=quantile(X,.75),lty=2,col="red")et ceux de la moustache (en bleu)
> abline(v=min(X),lty=2,col="blue") > abline(v=max(X),lty=2,col="blue")Mais ce n'est pas la version standard de la boite à moustaches, qui est - sous R - la suivante
> boxplot(X,horizontal=TRUE,log="x")

Si la borne inférieure de la moutache est toujours le minimum (en tous cas ici), pour la partie supérieure... c'est plus compliqué. Malheureusement ce n'est pas un quantile standard (ceux à 90% et 95% sont représentés ci-dessus en mauve). Pour comprendre ce que c'est, il faut regarder le code: ici, on rajoute au quartile supérieur (la partie de droite de la boite) un pourcentage de la distance inter-quartile (la longueur de la boite), et ce pourcentage est le suivant
> M=boxplot(X)$stats > (M[5]-M[4])/(M[4]-M[2]) [1] 1.350628 > 2*qnorm(.75) [1] 1.34898Plusieurs sites ou ouvrages suggèrent d'utiliser 1.5 fois la longueur interquartile, mais sous R, on utilise
> abline(v=quantile(X,.75)+2*qnorm(3/4)*IQR(X),lty=2,col="blue") > abline(v=min(X),lty=2,col="blue")et je laisse les quantiles pour montrer que (malheureusement) ce n'est pas ce qui est utilisé ici,
> abline(v=quantile(X,.9),lty=2,col="purple") > abline(v=quantile(X,.95),lty=2,col="purple")Cela dit, comme je l'évoquais lors de la formation (ici), beaucoup de monde laisse planer une réelle ambiguité sur ce qui est représenté par ces boites à moustaches... Et le fait que ce ne soit pas un quantile me gene un peu... Par exemple, ici on avait la boite à moustaches suivante

Si je prends les 25 plus petites observations, et que je les divise par 4, on obtient

La partie de gauche bouge (c'est normal, c'est la partie qu'on a modifiée), et la médiane reste identique, ainsi que la partie droite de la boite (ce qui est normal, ce sont des quantiles au delà de 25%). Mais de manière un peu génante, la borne supérieure de la moustache s'est ralongée... C'est normal, car on utilise une distance inter-quartile. Mais on n'a pas touché aux grandes observations, c'est donc génant de la voir bouger ainsi... non ?
Friday, November 16 2012
sur les transformations dans un modèle linéaire
By arthur charpentier on Friday, November 16 2012, 22:06 - ACT6420-A2012
Je voulais prendre 5 minutes pour reprendre une question posée par
courriel, qui me permettra de poursuivre sur des choses évoquées
mercredi dernier en cours. Je vais reformuler la question, mais en
gros, cela disait: la méthode de Box-Cox (évoquée en début de semaine, ici)
avait pour objet de choisir entre un modèle linéaire (que l'on étudié
dans tous les sens depuis le début du cours) et un modèle log-linéaire
(évoqué ici, par exemple). L'idée était d'associer le cas linéaire à
(dans la transformée de Box-Cox) et à
pour le cas multiplicatif (ou log-linéaire). Mais que se passe-t-il si
la valeur optimale rejette ces deux cas, et est proche (disons) de
?
Considérons le cas suivant (exemple que je ne me lasse d'utiliser)
> reglm=lm(dist~speed,data=cars) > library(MASS) > boxcox(reglm)

> regsqrt=lm(sqrt(dist)~speed,data=cars) + summary(regsqrt) Call: lm(formula = sqrt(dist) ~ speed, data = cars) Residuals: Min 1Q Median 3Q Max -2.0684 -0.6983 -0.1799 0.5909 3.1534 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.27705 0.48444 2.636 0.0113 * speed 0.32241 0.02978 10.825 1.77e-14 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.102 on 48 degrees of freedom Multiple R-squared: 0.7094, Adjusted R-squared: 0.7034 F-statistic: 117.2 on 1 and 48 DF, p-value: 1.773e-14et effectivement, le résultat est concluant. Mais on ne peut pas en rester là, comme pour pour le passage au logarithme, notre variable d'intéret reste
La prédiction sur le modèle transformé était
La première solution est de noter que
Une seconde consiste à noter que Z (conditionnellement à la variable explicative) était gaussienne. Donc en prenant le carré (moyennant quelques changement d'échelle), on devrait tomber sur une loi du chi-deux qui est une loi que l'on connait bien (sinon je peux renvoyer ici).
Essayons de creuser un peu ces idées (en particulier, pour dériver des intervalles de confiance). Si on visualise la prédiction de ce modèle sur la racine carrée de la distance de freinage, on obtient
> plot(speed,sqrt(dist)) > x=seq(0,30,by=.2) > distsqrtp=predict(regsqrt,newdata= + data.frame(speed=x),interval="prediction") > polygon(c(x,rev(x)),c(distsqrtp[,2], + rev(distsqrtp[,3])),col="yellow",border=NA) > lines(x,distsqrtp[,1],lwd=2,col="red")

> s=summary(regsqrt)$sigma > mu=predict(regsqrt) > distsqrtp01=(sqrt(dist)-mu)/sOn a ainsi une variable qui, conditionnellement à la variable explicative est supposé suivre une loi
> plot(speed,distsqrtp01,ylim=c(-3,3)) > abline(h=qnorm(c(.025,.975),0,1),lty=2,col="red")

> plot(speed,distsqrtp01^2) > abline(h=qchisq(.95,df=1),lty=2,col="red")

> mu=predict(regsqrt,newdata=data.frame(speed=x)) > distsup=(mu+s*sqrt(qchisq(.95,df=1)))^2 > distinf=(mu-s*sqrt(qchisq(.95,df=1)))^2que l'on peut visualiser simplement (sur les données de base, avec en ordonnée la distance)
> plot(cars) > lines(x,distsup,lty=2,col="red") > lines(x,distinf,lty=2,col="red")

> distesp=mu^2+s^2 > lines(x,distesp,lwd=2,col="red")

Sinon, comme je le disais en cours, si la transformation optimale sur
Regardons la régression de la distance (cette fois) sur la vitesse, et le carré de la vitesse,
> reglm=lm(dist~speed+I(speed^2),data=cars) > distp=predict(reglm,newdata= + data.frame(speed=x),interval="prediction")Si on visualise la prédiction, avec un intervalle de confiance, on obtient
> plot(cars) > polygon(c(x,rev(x)),c(distp[,2], + rev(distp[,3])),col="yellow",border=NA) > lines(x,distp[,1],lwd=2,col="red")


« previous entries - page 2 of 83 - next entries »








