Freakonometrics

To content | To menu | To search

Tag - temperature

Entries feed - Comments feed

Wednesday, November 14 2012

Série temporelles

Pour la section sur les séries temporelles, je mets des notes de cours en ligne. Les slides seront en ligne d'ici la semaine prochaine. Sinon, quelques séries dans le fichier ci-dessous

> source(
+ "http://freakonometrics.blog.free.fr/public/data/sourcets.R")

Les données sont les suivantes, avec l'utilisation du mot clé épilation sur Google,

> ts.plot(epilation,col="red")

..

ou l'utilisation du mot clé gym,

> ts.plot(gym,col="blue")

On a aussi la série des décès suite à des accidents de la route en Ontario,

> ts.plot(ontario,col="purple")

ou la série du nombre de vente de voitures au Québec

> ts.plot(quebec,col="purple")

La série suivante est la température minimale journalière au parc Montsouris, à Paris,

> ts.plot(temperature,col="green")

Pour finir, deux séries de transport, avec le nombre de voitures qui ont emprunté l'autoroute A7 en France

> ts.plot(autoroute7,col="red")

ou le nombre de voyageur dans un aéroport,

> ts.plot(airline,col="blue")

Monday, November 14 2011

Conference in Lyon on climate change and insurance

I will be in Lyon next Monday to give a talk on "Modeling heat-waves: return period for non-stationary extremes" in a workshop entitled "Changement climatique et gestion des risques". An interesting reference might be some pages from Le Monde (2010). The talk will be more a discussion about modeling series of temperatures (daily temperatures). A starting point might be the IPCC Third Assessment graph (on the left) which illustrates the effect on extreme temperatures when (a) the mean temperature increases, (b) the variance increases, and (c) when both the mean and variance increase for a normal distribution of temperature.

I will add here some code used to generate some graphs I will comment. The graph below it the daily minimum temperature,

TEMP=read.table("http://freakonometrics.blog.free.fr/
public/data/TN_STAID000038.txt"
,header=TRUE,sep=",") D=as.Date(as.character(TEMP$DATE),"%Y%m%d") T=TEMP$TN/10 day=as.POSIXlt(D)$yday+1 an=trunc(TEMP$DATE/10000) plot(D,T,col="light blue",xlab="Minimum daily temperature in Paris",ylab="",cex=.5) abline(R,lwd=2,col="red")

We can clearly see an increasing linear trend. But we do not care (too much) here about the increase of the average temperature, but more dispersion, and tails. Here are decenal box-plots

or quantile-regressions

library(quantreg)
PENTESTD=PENTE=rep(NA,99)
for(i in 1:99){
R=rq(T~D,tau=i/100)
PENTE[i]=R$coefficients[2]
PENTESTD[i]=summary(R)$coefficients[2,2]
}
m=lm(T~D)$coefficients[2]
plot((1:99)/100,(PENTE/m-1)*100,type="b")
segments((1:99)/100,((PENTE-2*PENTESTD)/m-1)*100,
(1:99)/100,((PENTE+2*PENTESTD)/m-1)*100,
col="light blue",lwd=3)
points((1:99)/100,(PENTE/m-1)*100,type="b")
abline(h=0,lty=2,col="red")

In order to get a better understanding of the graph above, here are slopes of quantile regressions associated to different probabilities,



The annualized maxima (of minimum temperature, i.e. warmest night of the year)

i.e. the regression of yearly maximas.

tail index of a Generalized Pareto distribution

Instead of looking at observation over a century (the trend is obviously linear), we can focus on seaonal behavior,

B=data.frame(Y=rep(T,3),X=c(day,day-365,day+365),
A=rep(an,3)) library(quantreg) library(splines) Q50=rq(Y~bs(X,10),data=B,tau=.5) Q95=rq(Y~bs(X,10),data=B,tau=.95) Q05=rq(Y~bs(X,10),data=B,tau=.05) YP95=predict(Q95,newdata=data.frame(X=1:366)) YP05=predict(Q05,newdata=data.frame(X=1:366)) I=(T>predict(Q95))|(T<predict(Q05)) YP50=predict(Q50,newdata=data.frame(X=1:366)) plot(day[I],T[I],col="light blue",cex=.5) lines(1:365,YP95[1:365],col="blue") lines(1:365,YP05[1:365],col="blue") lines(1:365,YP50[1:365],col="blue",lwd=3)

with on red series from 1900 till 1920, and on purple from 1990 till 2010. If we remove the linear trend, and the seasonal cycle, here are the residuals, assume to be stationary,

on during the year

Obviously, something has been missed,

The graph below is the volatility of the residual series, within the year,

Instead of looking at volatility, we can focus on tails, with tail index per month,

mois=as.POSIXlt(D)$mon+1
Pmax=Dmax=matrix(NA,12,2)
for(s in 1:12){
X=T3[mois==s]
FIT=gpd(X,5)
Pmax[s,1:2]=FIT$par.ests
Dmax[s,1:2]=FIT$par.ses
}
plot(1:12,Pmax[,1],type="b",col="blue",
ylim=c(-.6,0)) segments(1:12,Pmax[,1]+2*Dmax[,1],1:12,Pmax[,1]- 2*Dmax[,1],col="light blue",lwd=2) points(1:12,Pmax[,1],col="blue") text(1:12,rep(-.5,12),c("JAN","FEV","MARS", "AVR","MAI","JUIN","JUIL","AOUT","SEPT", "OCT","NOV","DEV"),cex=.7)

At the end of the talk, I will also mention multiple city models, e.g. Paris and Marseilles,

If we look at residuals (once we have removed the linear trend and the seasonal cycle) we observe some positive dependence

In order to study (strong) tail dependence, define

http://blogperso.univ-rennes1.fr/arthur.charpentier/public/perso3/Llatex2png.2.php.png
for lower left tail and
http://blogperso.univ-rennes1.fr/arthur.charpentier/public/perso3/Clatex2png.2.php.png
for upper right tail, where http://perso.univ-rennes1.fr/arthur.charpentier/latex/toclatex2png-12.2.php.png is the survival copula associated to http://perso.univ-rennes1.fr/arthur.charpentier/latex/toclatex2png-13.2.php.png, i.e.
http://perso.univ-rennes1.fr/arthur.charpentier/latex/toclatex2png-14.2.php.png
and
http://perso.univ-rennes1.fr/arthur.charpentier/latex/toclatex2png-15.2.php.png

It looks like there is no tail dependence (in the uper tail). But it is also possible to study weaker tail dependence, through

http://perso.univ-rennes1.fr/arthur.charpentier/latex/toc2latex2png.3.php.png
and
http://perso.univ-rennes1.fr/arthur.charpentier/latex/toc2latex2png.4.php.png


Slides can be visualized below, I will upload them soon,

Tuesday, January 25 2011

You find it cold in Montréal ? trust me, it is even worse than what you can imagine...

As people say in Montréal, "aujourd'hui, il fait frette". And I have been surprised recently when some people told my that we would reach -35°C Sunday evening... I checked around, and I found -25°C on all weather forecast websites. But nowhere -35°C. I asked some friends, and they told me that those people were not really looking at the air temperature (as we observe on the thermometer), but they were looking at the wind chill, also called "felt air temperature on exposed skin due to the wind" (température ressentie).
And indeed, such a quantity does exist, and can be found on the climate.weatheroffice.gc.ca website. There is also a physical background for that quantity. Hence, the windchill is http://freakonometrics.blog.free.fr/public/maths/windchill2.png defined as
http://freakonometrics.blog.free.fr/public/maths/windchill1.png
where http://freakonometrics.blog.free.fr/public/maths/windchill3.png is the air temperature (in °C), and http://freakonometrics.blog.free.fr/public/maths/windchill4.png the wind speed (in km/h). Please don't ask me how to interpret this power 0.16 (I already find difficult to explain a square root in an econometric equation). If we look at the past previous days we observe the following observations,
where points on top are temperature, while below we have felt temperature.So, basically, winters are even colder than what you might think..
And the story is not over, yet. The same thing holds for summer: if you take into account humidity, summer are even hotter than what you think... There is the humidex, http://freakonometrics.blog.free.fr/public/maths/humidex2.png, defined here as
http://freakonometrics.blog.free.fr/public/maths/humidex.png
where http://freakonometrics.blog.free.fr/public/maths/humidex3.png denotes a dewpoint (see here for more details).
That index appeared in the 70's, with a work of Masterson and Richardson entitled a method of quantifying human discomfort due to excessive heat and humidity (published in 1979).By that time, in Canada, on average, 22 people died, per year, because of those excessive heat and humidity. For those interested by the origin of that index, you can have a look here.
Recently, @Annmaria (here) told me that one might expect variance to increase, i.e. maximas should be increasing faster than minimas. I just wonder if this intuition can be related to the fact that more and more people (including some medias) now talk more about felt temperatures than measured temperatures. And if we compare past temperatures to felt temperature we have today, it looks like the difference between extremes is increasing....

Friday, January 14 2011

Warming in Paris: minimas versus maximas ?

Recently, I received comments (here and on Twitter) about my previous graphs on the temperature in Paris. I mentioned in a comment (there) that studying extremas (and more generally quantiles or interquantile evolution) is not the same as studying the variance. Since I am not a big fan of the variance, let us talk a little bit about extrema behaviour.
In order to study the average temperature it is natural to look at the linear (assuming that it is linear, but I proved that it could reasonably be assumed as linear in the paper) regression, i.e. least square regression, which gives the expected value. But if we care about extremes, or almost extremes, it is natural to look at quantile regression.
For instance, below, the green line is the least square regression, the red one is 97.5% quantile, and the blue on the 2.5% quantile regression.

It looks like the slope is the same, i.e. extremas are increasing as fast as the average...
tmaxparis=read.table("temperature/TG_SOUID100845.txt",
skip=20,sep=",",header=TRUE)
head(tmaxparis)
Dparis=as.Date(as.character(tmaxparis$DATE),"%Y%m%d")
Tparis=as.numeric(tmaxparis$TG)/10
Tparis[Tparis==-999.9]=NA
I=sample(1:length(Tparis),size=5000,replace=FALSE)
plot(Dparis[I],Tparis[I],col="grey")
abline(lm(Tparis~Dparis),col="green")
library(quantreg)
abline(rq(Tparis~Dparis,tau=.025),col="blue")
abline(rq(Tparis~Dparis,tau=.975),col="red")
(here I plot randomly some points to avoid a too heavy figure, since I have too many observations, but I keep all the observations in the regression !).
Now, if we look at the slope for different quantile level (Fig 6 in the paper, here, but on minimum daily temperature, here I look at average daily temperature), the interpretation is different.
s=0
COEF=SD=rep(NA,199)
for(i in seq(.005,.995,by=.005)){
s=s+1
REG=rq(Tparis~Dparis,tau=i)
COEF[s]=REG$coefficients[2]
SD[s]=summary(REG)$coefficients[2,2]
}
with the following graph below,
s=0
plot(seq(.005,.995,by=.005),COEF,type="l",ylim=c(0.00002,.00008))
for(i in seq(.005,.995,by=.005)){
s=s+1
segments(i,COEF[s]-2*SD[s],i,COEF[s]+2*SD[s],col="grey")
}
REG=lm(Tparis~Dparis)
COEFlm=REG$coefficients[2]
SDlm=summary(REG)$coefficients[2,2]
abline(h=COEFlm,col="red")
abline(h=COEFlm-2*SDlm,lty=2,lw=.6,col="red")
abline(h=COEFlm+2*SDlm,lty=2,lw=.6,col="red")

Here, for minimas (quantiles associated to low probabilities, on the left), the trend has a higher slope than the average, so in some sense, warming of minimas is stronger than average temperature, and on other hand, for maximas (high probabilities on the right), the slope is smaller - but positive - so summer are warmer, but not as much as winters.
Note also that the story is different for minimal temperature (mentioned in the paper) compared with that study, made here on average daily temperature (see comments)... This is not a major breakthrough in climate research, but this is all I got...

Wednesday, January 12 2011

More climate extremes, or simply global warming ?

In the paper on the heat wave in Paris (mentioned here) I discussed changes in the distribution of temperature (and autocorrelation of the time series).
During the workshop on Statistical Methods for Meteorology and Climate Change today (here) I observed that it was still an important question: is climate change affecting only averages, or does it have an impact on extremes ? And since I've seen nice slides to illustrate that question, I decided to play again with my dataset to see what could be said about temperature in Paris.
Recall that data can be downloaded here (daily temperature of the XXth century).
tmaxparis=read.table("/temperature/TX_SOUID100124.txt",
skip=20,sep=",",header=TRUE)
Dmaxparis=as.Date(as.character(tmaxparis$DATE),"%Y%m%d")
Tmaxparis=as.numeric(tmaxparis$TX)/10
tminparis=read.table("/temperature/TN_SOUID100123.txt",
skip=20,sep=",",header=TRUE)
Dminparis=as.Date(as.character(tminparis$DATE),"%Y%m%d")
Tminparis=as.numeric(tminparis$TN)/10
Tminparis[Tminparis==-999.9]=NA
Tmaxparis[Tmaxparis==-999.9]=NA
annee=trunc(tminparis$DATE/10000)
MIN=tapply(Tminparis,annee,min)
plot(unique(annee),MIN,col="blue",ylim=c(-15,40),xlim=c(1900,2000))
abline(lm(MIN~unique(annee)),col="blue")
abline(lm(Tminparis~unique(Dminparis)),col="blue",lty=2)
annee=trunc(tmaxparis$DATE/10000)
MAX=tapply(Tmaxparis,annee,max)
points(unique(annee),MAX,col="red")
abline(lm(MAX~unique(annee)),col="red")
abline(lm(Tmaxparis~unique(Dmaxparis)),col="red",lty=2)
On the plot below, the dots in red are the annual maximum temperatures, while the dots in blue are the annual minimum temperature. The plain line is the regression line (based on the annual max/min), and the dotted lines represent the average maximum/minimum daily temperature (to illustrate the global tendency),

It is also possible to look at annual boxplot, and to focus either on minimas, or on maximas.
annee=trunc(tminparis$DATE/10000)
boxplot(Tminparis~as.factor(annee),ylim=c(-15,10),
xlab="Year",ylab="Temperature",col="blue")
x=boxplot(Tminparis~as.factor(annee),plot=FALSE)
xx=1:length(unique(annee))
points(xx,x$stats[1,],pch=19,col="blue")
abline(lm(x$stats[1,]~xx),col="blue")
annee=trunc(tmaxparis$DATE/10000)
boxplot(Tmaxparis~as.factor(annee),ylim=c(15,40),
xlab="Year",ylab="Temperature",col="red")
x=boxplot(Tmaxparis~as.factor(annee),plot=FALSE)
xx=1:length(unique(annee))
points(xx,x$stats[5,],pch=19,col="red")
abline(lm(x$stats[5,]~xx),col="red")
Plain dots are average temperature below the 5% quantile for minima, or over the 95% quantile for maxima (again with the regression line),

We can observe an increasing trend on the minimas, but not on the maximas !
Finally, an alternative is to remember that we focus on annual maximas and minimas. Thus, Fisher and Tippett theory (mentioned here) can be used. Here, we fit a GEV distribution on a blog of 10 consecutive years. Recall that the GEV distribution is
http://freakonometrics.blog.free.fr/public/perso/gev1.png
install.packages("evir")
library(evir)
Pmin=Dmin=Pmax=Dmax=matrix(NA,10,3)
for(s in 1:10){
X=MIN[1:10+(s-1)*10]
FIT=gev(-X)
Pmin[s,]=FIT$par.ests
Dmin[s,]=FIT$par.ses
X=MAX[1:10+(s-1)*10]
FIT=gev(X)
Pmax[s,]=FIT$par.ests
Dmax[s,]=FIT$par.ses
}
The location parameter http://freakonometrics.blog.free.fr/public/perso/gev4.png is the following, with on the left the minimas and on the right the maximas,

while the scale parameter http://freakonometrics.blog.free.fr/public/perso/gev3.png is

and finally the shape parameter http://freakonometrics.blog.free.fr/public/perso/gev2.png is

On those graphs, it is very difficult to say anything regarding changes in temperature extremes... And I guess this is a reason why there is still active research on that area...

Sunday, November 7 2010

Updating meteorological forecasts, part 1

As Mark Twain said "the art of prophecy is very difficult, especially about the future" (well, actually I am not sure Mark Twain was the  first one to say so, but if you're interested by that sentence, you can look here). I have been rather surprised to see how Canadians can be interested in weather, and weather forecasts (see e.g. here for a first study). For instance, on that website (here), we can have forecasts of the temperature (but also rain and wind, which is interesting when you ride a bike) over the next week, almost on an hourly basis (I show here only GFS forecasts (Global Forecast System,here), which is quite standard, see here or there, but other meteorological models are considered on the website)

As a statistician (or maybe more as a bike rider) I was wondering if I should trust those forecasts... and if a mistake today means an error tomorrow, too...
So, to look at this problem, Vincent (alias @Vicnent) helped me (again, since he helped me already to get backups of betting websites during the soccer world cup, in June) to make backups of those tables...
  • A descriptive study
For instance, if we look at forecasts made on the 21st of October (00:00) we have the following for the temperature in °C, for different time horizon,
6 hours latter (21st of October (06:00)), we have the following

On the other hand, consider forecasts made for the 25th of October (11:00) we have

while for 3 hours later (25th of October (14:00)) we have

and again for 3 hours later (25th of October (17:00)) we have

So obviously, 5 days earlier, there has been a bump of optimism, or perhaps simply a typing mistake since we jump from 4 before and after to 14 during 6 hours*. If we can expect to have different predictions for different days (even within the day since it is usually cooler during the night), we can try to see if bumps, or forecast errors can be understood.
Here, it is difficult to observe the scatterplot of temperature,

That graph was based on the following points, with here the scatterplot of prediction made on the 21st of October (00:00)

with a smoothed version below (using standard splines),

or, if we consider forecasts made for the 25th of October (11:00) we have

with again, a smoothed version,

The code is the  following,
donnees=read.table("http://perso.univ-rennes1.fr/
arthur.charpentier/pred-meteo2.txt"
)
x=donnees$DATE1+donnees$HEURE1/24
x[x<15]=x[x<15]+31
y=donnees$DATE2+donnees$HEURE2/24
y[y<15]=y[y<15]+31
z=y-x
h=donnees$TEMPERATURE
D=data.frame(x,y,z,h)
X0=sort(unique(x))
Y0=sort(unique(y))
Z0=sort(unique(z))
matY.X=matrix(NA,length(X0),length(Y0))
library(splines)
for(i in 1:length(X0)){
dd=D[D$x==X0[i],]
ax=dd$y
ay=dd$h
a=data.frame(ax,ay)
ra=lm(ay~bs(ax,df=5),data=ba)
X=Y0[(Y0>min(ax))&(Y0<max(ax))]
iX=which((Y0>min(ax))&(Y0<max(ax)))
Y=predict(ra,newdata=data.frame(ax=X))
matY.X[i,iX]=Y
}
So, let us look at levels of iso-temperature, as a function of the date the prediction is made (the x-axis) and the horizon (the y-axis). First, we we fix the date of the prediction, we get (from the code above)

and then, when we fix the horizon of the prediction

It looks like the small heat wave we experienced at the end of October has been perfectly predicted... And here, it looks like the only pattern we can observe is the standard evolution of temperature: there are no bumps due to forecasting problems..
  • Modeling predicted temperatures
Let us use some additive or multiplicative models, i.e. either
http://perso.univ-rennes1.fr/arthur.charpentier/latex/predictemp01.png
where http://perso.univ-rennes1.fr/arthur.charpentier/latex/predictemp0x.png denotes the date the prediction was made and http://perso.univ-rennes1.fr/arthur.charpentier/latex/predictemp0y.png the date the horizon, or
http://perso.univ-rennes1.fr/arthur.charpentier/latex/predictemp02.png
This was easy with the gam framework... For instance, consider the following code (some regressions mentioned below will be used later on)
library(mgcv)
library(splines)
REG1=gam(h~s(x,y),data=D,family=gaussian)
REG2=lm(h~bs(x)+bs(y),data=D)
REG3=gam(h~s(x)+s(y),data=D,family=gaussian)
REG4=gam(h~s(x)+s(y),data=D,family=gaussian(link="log"))
REG5=gam(h~s(y),data=D,family=gaussian)
D$e=residuals(REG5)
REG6=gam(e~s(x,y),data=D,family=gaussian)
Here is the prediction we had with the additive model
and for the multiplicative model

(colors are different because of the log scaling but the shape is similar).

For the component on http://perso.univ-rennes1.fr/arthur.charpentier/latex/predictemp0x.png (in blue), we have, with the additive model

and for the multiplicative model

On the other hand, for the component on http://perso.univ-rennes1.fr/arthur.charpentier/latex/predictemp0y.png (in red) we have, with the additive model

and for the multiplicative model 

 Note that we observe something which looks like the true temperature (we actually had)

So finally, on average, those predictions are consistent with the true temperature we experienced. But so far, we cannot say anything about prediction errors. If we consider a bivariate spline regression, i.e.
http://perso.univ-rennes1.fr/arthur.charpentier/latex/predictemp04.png
for some smoothed function, we have, again, the same kind of predictions,

It looks like what we've seen before. And because we have those horizontal lines, only the horizon http://perso.univ-rennes1.fr/arthur.charpentier/latex/predictemp0y.png should be considered.... So let us study the following model
http://perso.univ-rennes1.fr/arthur.charpentier/latex/predictemp05.png
where we obtain the following prediction

(which is similar with previous graph) but if we fit a spline on residuals obtained from that regression, i.e.
http://perso.univ-rennes1.fr/arthur.charpentier/latex/predictemp07.png
where
http://perso.univ-rennes1.fr/arthur.charpentier/latex/predictemp06.png
we have the following distribution for http://perso.univ-rennes1.fr/arthur.charpentier/latex/predictemp0h.png

This is where we can see prediction errors. We can observe that meteorologists have been a bit surprised by the heat-wave we observed just after the 25th. But let us wait a few more days to get more data, and perhaps we'll have a look at something that I do care about: prediction of rainfall....
* perhaps I should mention that we do not have backups of tables, but backups of html pages... then I use simple linguistic tools available on R to extract the information where it should be....