# Freakonometrics

## Tag - US

Tuesday, October 16 2012

## Régresion multiple

Un peu de code pour le cours de demain,

```US=read.table("http://freakonometrics.free.fr/US.txt",
"http://freakonometrics.blog.free.fr/public/data/etatus.csv",
US\$USPS=rownames(US)
US=merge(US,abreviation)
US\$state=tolower(US\$NOM)
"http://freakonometrics.blog.free.fr/public/data/governor.csv",
etat=strsplit(as.character(GV\$State),"-")
listeetat=rep(NA,nrow(GV))
for(i in 1:nrow(GV)){
listeetat[i]=etat[[i]][1]
}
indice=which(is.na(listeetat)==FALSE)
basegv=data.frame(state=tolower(listeetat[indice]),
party=GV\$Party[indice])
base=merge(US,basegv)```

Je mets aussi une petite fonction pour faire des graphiques,

```library(maps)
VL0=strsplit(map("state")\$names,":")
VL=VL0[[1]]
for(i in 2:length(VL0)){VL=c(VL,VL0[[i]][1])}
ETAT=match(VL,US\$state)
library(RColorBrewer)
carte=function(V=US\$Murder,titre=
"Taux d'homicides aux Etats-Unis"){
variable=as.numeric(as.character(cut(V,
quantile(V,seq(0,1,by=1/6)),labels=1:6)))
niveau=variable[ETAT]
couleur=rev(brewer.pal(6, "RdBu"))
noml=levels(cut(V,quantile(V,seq(0,1,by=1/6))))
map("state", fill = TRUE, col=couleur[niveau]);
legend(-78,34,legend=noml,fill=couleur,
cex=1,bty="n");
title(titre)}
carte(US\$Murder,titre=
"Taux d'homicides aux Etats-Unis")```

Monday, June 18 2012

## Date of death, birthday and Elvis Presley

10 days ago, a study published on http://www.annalsofepidemiology.org/ mentioned that "Death has a preference for birthdays" (as claimed in the title). The conclusion of the paper is that, in general, birthdays do not evoke a postponement mechanism but appear to end up in a lethal way more frequently than expected (“anniversary reaction”). Well, this is not new, and several previous articles have mentioned that point, e.g. Angermeyer et al. (1987).

I found the idea interesting since in demography, there is a large literature trying to extrapolate death rates from discrete to continuous time. Extrapolation are usually extremely smooth. But none of them integrate that aspect of mortality precisely on the birthday. The problem is that it is rather difficult to say something since datasets with individual observations are rare, online.

But yesterday, @coulmont sent me a tweet mentioning a website. I do not know if this is legal (even if some explanations are given), but I will mention courtesy of http://ssdmf.info/. It is a so-called Social Security Death Master File, containing individual informations about deaths in the US, as well as geographic information (as described on http://www.ssa.gov/), for people having a social security number.

With R, it is possible to work on those files (even they are huge, with tens of millions observations). For instance, we can check who is inside.

```> elvis=scan("ssdm2",skip=22371720,n=1,what="character",sep=",")
> elvis
[1] " 409522002PRESLEY         ELVIS     0800197701081935  " ```

If you believe that Elvis is dead, you might agree that this database can be accurate (or at least, not too bad). And further, we can see here how to read the result: Elvis was born on January 8, 1935 (8 last digits), and died on August 16, 1977 (8 digits before). Obviously here, there are some problems with the dataset (we do not have the day of the death of Elvis). So here, we remove all the observations that do not give us proper dates. Then, the idea is to assume that the person died in 2000 (or any year since the point is to focus on days and months). Then, we count the number of days between the day of death and the birthday in 2001 (that would have been after) and the one in 2000 (that was either before or after the death), so that we can derive the number of days after the birthday,

```dates=substr(base,66,81)
death=as.Date(substr(dates,1,8),"%m%d%Y")
birth=as.Date(substr(dates,9,16),"%m%d%Y")
indice=is.na(death)|is.na(birth)
mean(indice)
mdeath=substr(dates,1,2)
ddeath=substr(dates,3,4)
mbirth=substr(dates,9,10)
dbirth=substr(dates,11,12)
indice=which(ddeath!="00")
birth1=as.Date(paste(mbirth[indice],
dbirth[indice],"2000",sep=""),"%m%d%Y")
birth2=as.Date(paste(mbirth[indice],
dbirth[indice],"2001",sep=""),"%m%d%Y")
death=as.Date(paste(mdeath[indice],ddeath[indice],
"2000",sep=""),"%m%d%Y")
k=length(indice)
diffday=cbind((as.numeric(death-birth1))[1:k],
(as.numeric(death-birth2))[1:k])
DIFF=apply(diffday,1,function(x) {min(x[x>=0])})```

What we have here is the number of days following the previous birthday. If we look at the distribution of that number of days, we obtain

```counts=table(DIFF)
plot(as.numeric(names(counts)),
as.numeric(counts))
counts["0"]/(mean(counts[100:200]))
> counts["0"]/(mean(counts[100:200]))
0
1.121261 ```

Thus, the death excess on the day of birth was around 12%, which is rather close to the one obtained from the Swiss mortality statistics 1969–2008 (in Ajdacic-Gross et al. (2012)). Note that here, we just play with a small subset of the entire dataset,

That database is probably extremely interesting, except that it suffers a huge selection bias, since only dead people are in that database. So it might be useless if we wish to study life expectancy of people named Bill versus people named Georges (that was something I wanted to investigate initially). But we'll see what else we can do with it (since Ewen have been able to write some code to go through that huge dataset).

Saturday, February 4 2012

## Good old days, politics and mathematics

2012 will be, in several countries, a presidential election year. Some decades ago, candidates were not always supposed to spend months on some demagogic democratic debates, but had time to spend on more important problems. Like mathematics... For instance, a congressman, who became the 20th president of the United States, James Garfield, gave the following proof to the Pythagorean theorem (actually, he wrote that proof five years before he become President). The legend claims that he found this proof in 1876 during a mathematics discussion with some of the members of Congress... Those were good old days, when politicians were interested in mathematics, and sciences. The proof he suggested was the following

i.e. since

I found that nice graph in Roger Nelsen's book. For more details, Klebe (1995), or on wikipedia. And for those who love proofs without words, look at the 96 geometrical proofs of the Pythagorean theorem mentioned on http://www.cut-the-knot.org/.

Credit: Charles Logan, played by in Gregory Itzin, in 24