This Monday, during my talk on quantile regressions (at the Montreal R-meeting), we've seen how those nice graphs could be interpreted, with the evolution of the slope of the linear regression, as a function of the probability level. One illustration was on large hurricanes, from Elsner, Kossin & Jagger (2008). The other one was on birthweight, from Abrevaya (2001).

It is also to illustrate that technique to academic publication, e.g. the length of papers, over time. Actually, the data we can extract from Scopus are quite similar to the ones uses on hurricanes. For several journals, it is possible to look at the length of articles. Since Scopus is quite expensive ($60,000 per year for the campus, as far as remember, so I can imagine the penalty I might have to pay for sharing such a dataset)

base=read.table("/home/scopus.csv",
header=TRUE,sep=",")
pages=base$Page.end-base$Page.start
year=base$Year

Again, a first idea can be to look at boxplots, and regression on (nonparametric) quantiles, here for Econometrica,

boxplot(pages~as.factor(year),col="light blue")
Q=function(p=.9) as.vector(by(pages,as.factor(year),
function(x) quantile(x,p)))
u=1:16
points(u,Q(p),pch=19,col="blue")
abline(lm(Q(p)~u,weights=table(year)),lwd=2,col="blue")

Consider now (as in the slides in the previous post) a quantile regression (instead of a regression on quantiles), for instance in the Annals of Probability,

library(quantreg)
u=seq(.05,.95,by=.01)
coefstd=function(u) summary(rq(pages~year,
tau=u))$coefficients[,2]
coefest=function(u) summary(rq(pages~year,
tau=u))$coefficients[,1]
CS=Vectorize(coefstd)(u)
CE=Vectorize(coefest)(u)
k=2
plot(u,CE[k,],ylim=c(min(CE[k,]-2*CS[k,]),
max(CE[k,]+2*CS[k,])))
polygon(c(u,rev(u)),c(CE[k,]+1.96*CS[k,],
rev(CE[k,]-1.96*CS[k,])),
col="light green",border=NA)
lines(u,CE[k,],lwd=2,col="red")
abline(h=0)

We have the following slope, for the year, as a function of the probability level,

The slope is always positive, so size of papers is increasing with time, short and long papers. But the influence of time is much larger for long paper than short one: for short papers (lower decile) every year, the size keeps increasing, with one more page every three years. For long paper (upper decile), it is two more pages every three years.

If we look now at the Annals of Statistics, we have

and for the evolution of the slope of the quantile regression,

Again the impact is positive: papers are longer in 2010 than 15 years ago. But the trend is the reverse: short papers (lower decile) are much longer, almost one more page every year, with long paper increase only by one more page every two years... Initially, I want to run such a study on a much longer term, with quantile regressions and splines to see when there might have been a change, both in lower and upper tails. Unfortunately, as suggested by some colleagues, there might have been some changes in the format of the journal (columns, margins, fonts, etc). That's a shame, because I rediscover nice short papers of 5-10 pages published 20 or 30 years ago. They are nice to read (and also potentially interesting for a post on the blog). 5 pages, that's perfect, but 40 pages, that's way too long. I wonder if I am the only one having this feeling, missing those short but extremely interesting papers....