To content | To menu | To search

Research › publications

Entries feed - Comments feed

Friday, August 31 2012

Border bias and weighted kernels

With Ewen (aka @3wen), not only we have been playing on Twitter this month, we have also been working on kernel estimation for densities of spatial processes. Actually, it is only a part of what he was working on, but that part on kernel estimation has been the opportunity to write a short paper, that can now be downloaded on hal.

The problem with kernels is that kernel density estimators suffer a strong bias on borders. And with geographic data, it is not uncommon to have observations very close to the border (frontier, or ocean). With standard kernels, some weight is allocated outside the area: the density does not sum to one. And we should not look for a global correction, but for a local one. So we should use weighted kernel estimators (see on hal for more details). The problem that weights can be difficult to derive, when the shape of the support is a strange polygon. The idea is to use a property of product Gaussian kernels (with identical bandwidth) i.e. with the interpretation of having noisy observation, we can use the property of circular isodensity curve. And this can be related to Ripley (1977) circumferential correction. And the good point is that, with R, it is extremely simple to get the area of the intersection of two polygons. But we need to upload some R packages first,


To be more clear, let us illustrate that technique on a nice example. For instance, consider some bodiliy injury car accidents in France, in 2008 (that I cannot upload but I can upload a random sample),


The border of the support of our distribution of car accidents will be the contour of the Finistère departement, that can be found in standard packages

"", header=TRUE,sep=",",comment.char="",check.names=FALSE, colClasses=c(rep("character",5),rep("numeric",38))) geoloc=geoloc[,c("dep","com","com_nom", "long","lat","pop_2008")] geoloc$id=paste(sprintf("%02s",geoloc$dep), sprintf("%03s",geoloc$com),sep="") geoloc=geoloc[,c("com_nom","long","lat","pop_2008")] head(geoloc) france=map('france',namesonly=TRUE, plot=FALSE) francemap=map('france', fill=TRUE, col="transparent", plot=FALSE) detpartement_bzh=france[which(france%in% c("Finistere","Morbihan","Ille-et-Vilaine", "Cotes-Darmor"))] bretagne=map('france',regions=detpartement_bzh, fill=TRUE, col="transparent", plot=FALSE,exact=TRUE) finistere=cbind(bretagne$x[321:678],bretagne$y[321:678]) FINISTERE=map('france',regions="Finistere", fill=TRUE, col="transparent", plot=FALSE,exact=TRUE) monFINISTERE=cbind(FINISTERE$x[c(8:414)],FINISTERE$y[c(8:414)])

Now we need simple functions,

POLcercle=as(leCercle, "gpc.poly")
lissage = function(U,polygone,optimal=TRUE,h=.1)
if(optimal==TRUE) {H=Hpi(U,binned=FALSE);
if(optimal==FALSE){H= matrix(c(h,0,0,h),2,2)

before defining our weights.

OMEGA=parLapply(cl,1:n,poidsU,U=U,h=sqrt(H[1,1]), POL=as(polygone, "gpc.poly"))"c",OMEGA) stopCluster(cl) }else {OMEGA=lapply(1:n,poidsU,U=U,h=sqrt(H[1,1]), POL=as(polygone, "gpc.poly"))"c",OMEGA)}

Note that it is possible to parallelize if there are a lot of observations,

{cl <- makeCluster(4,type="SOCK")
worker.init <- function(packages)
{for(p in packages){library(p, character.only=T)}
clusterCall(cl, worker.init, c("gpclib","sp")) clusterExport(cl,c("cercle","poids"))

Then, we can use standard bivariate kernel smoothing functions, but with the weights we just calculated, using a simple technique that can be related to one suggested in Ripley (1977),

VX = cbind(rep(vx,each=length(vy)))
VY = cbind(rep(vy,length(vx)))
Ind=matrix(,VY, polygone[,1],
lissage_without_c = function(U,polygone,optimal=TRUE,h=.1)
if(optimal==TRUE) {H=Hpi(U,binned=FALSE);
if(optimal==FALSE){H= matrix(c(h,0,0,h),2,2)}
VX = cbind(rep(vx,each=length(vy)))
VY = cbind(rep(vy,length(vx)))
Ind=matrix(,VY, polygone[,1],

So, now we can play with those functions,

breaks=breaks, col=rev(heat.colors(20)),xlab="",
col="dodger blue")

If we look at the graphs, we have the following densities of car accident, with a standard kernel on the left, and our proposal on the right (with local weight adjustment when the estimation is done next to the border of the region of interest),

Similarly, in Morbihan,

With those modified kernels, hot spots appear much more clearly. For more details, the paper is online on hal.

Monday, April 16 2012

Blog citation in academic journals

This morning, I had a question from someone who used some old results published on the blog, but not elsewhere. Kindly, the author of the email asked me how to mention the post in an academic article. Even if the webpage might be old (some blogs mentioned that webpage in 2007 already), the US National Library of Medicine has provided guidance on how to cite a blog entry (as well as emails or wikipages) in an academic paper. Several examples are also mentioned.

Wednesday, January 25 2012

Local Utility and Multivariate Risk Aversion

The paper with Alfred Galichon and Marc Henry, on Local Utility and Multivariate Risk Aversion is now available online on,

"The present paper We revisit Machina's local utility as a tool to analyze attitudes to multivariate risks. Using martingale embedding techniques, we show that for non-expected utility maximizers choosing between multivariate prospects, aversion to multivariate mean preserving increases in risk is equivalent to the concavity of the local utility functions, thereby generalizing Machina's result in Machina (1982). To analyze comparative risk attitudes within the multivariate extension of rank dependent expected utility of Galichon and Henry (2011), we extend Quiggin's monotone mean and utility preserving increases in risk and show that the useful characterization given in Landsberger and Meilijson (1994) still holds in the multivariate case"

Wednesday, December 14 2011

Natural Catastrophe Insurance: How Should the Government Intervene?

An updated version of the joint paper with Benoit Le Maux is online on

"The present paper develops a new theoretical framework for analyzing the decision to provide or buy insurance against the risk of natural catastrophes. In contrast with conventional models of insurance, the insurer has a non-zero probability of insolvency that depends on the distribution of the risks, the premium rate, and the amount of capital in the company. Among several results, we show that risk-averse policyholders will accept to pay higher rates for a government-provided insurance with unlimited guarantee. However, depending on the correlation between and within the regional risks, a government program can be more attractive to high-correlation than to low correlation
areas, which may lead to inefficiencies if the insurance ratings are not appropriately chosen.

Wednesday, November 30 2011

BINAR processes and earthquakes

With Mathieu Boudreault, we finally uploaded our working paper on multivariate integer-valued autoregressive models applied to earthquake counts on and on

"In various situations in the insurance industry, in finance, in epidemiology, etc., one needs to represent the joint evolution of the number of occurrences of an event. In this paper, we present a multivariate integer-valued autoregressive (MINAR) model, derive its properties and apply the model to earthquake occurrences across various pairs of tectonic plates. The model is an extension of Pedelis & Karlis (2011) where cross autocorrelation (spatial contagion in a seismic context) is considered. We fit various bivariate count models and find that for many contiguous tectonic plates, spatial contagion is significant in both directions. Furthermore, ignoring cross autocorrelation can underestimate the potential for high numbers of occurrences over the short-term. Our overall findings seem to further confirm Parsons & Velasco (2001)."

The starting point of our paper with Mathieu was the paper on the absence of remotely triggered large earthquakes beyond the main shock region, by Thomas Parsons and Aaron Velasco published in May 2011 in Nature Geoscience. I was supposed to present this work at the Geotop seminar last week, but the seminar has been canceled and I will probably present it this Winter. Slides as well as R code will be uploaded for the seminar.

Friday, June 17 2011

Loi des grands nombres et théorème central limite en assurance

Dans le dernier numéro de la revue Risques figure un petit article sur la loi des grands nombres et le théorème central limite en assurance (ici) Le but était de remettre un peu les pendules à l'heure sur la différence entre les implications de ces deux résultats, et surtout d'amorcer une discussion sur l'importance des hypothèses (en particulier celle de risques indépendants).

Sunday, January 2 2011

The return period of the 2003 heat wave

The paper on "on the return period of the 2003 heat wave" has been published online here, before Christmas. It should be published soon by Climatic Change.

"Extremal events are difficult to model since it is difficult to characterize formally those events. The 2003 heat wave in Europe was not characterized by very high temperatures, but mainly the fact that night temperature were no cool enough for a long period of time. Hence, simulation of several models (either with heavy tailed noise or long range dependence) yield different estimations for the return period of that extremal event."

To go further on the impact on mortality (which was not the aim of the paper), there is a paper in Nature (here).

Tuesday, December 28 2010

Generating stress scenarios: null correlation is not enough

In a recent post (here, by @teramonagi), Teramonagi mentioned the use of PCA to model yield curve, i.e. to obtain the three factor, "parallel shift", "twist" and "butterfly". As in Nelson & Siegel, if m is maturity, y\left( m \right) is the yield of the curve at maturity m, assume that

where β0, β1, β2 and τ, are parameters to be fitted
  • β0 is interpreted as the long run levels of interest rates (the loading is 1, it is a constant that does not decay)
  • β1 is the short-term component (it starts at 1, and decays monotonically and quickly to 0);
  • β2 is the medium-term component (it starts at 0, increases, then decays to zero);
  • τ is the decay factor: small values produce slow decay and can better fit the curve at long maturities, while large values produce fast decay and can better fit the curve at short maturities; τ also governs where β2 achieves its maximum.
(see e.g. here). Those factors can be obtained using PCA,
term.structure = read.csv("C:\\tmp\\FRB_H15.csv",
term.structure = tail(term.structure,1000)
term.structure = term.structure[,-1]
label.term = c("1M","3M","6M","1Y","2Y","3Y","5Y"
colnames(term.structure) = label.term
term.structure = subset(term.structure,
term.structure$'1M' != "ND")
term.structure = apply(term.structure,2,as.numeric)
term.structure.diff = diff(term.structure)
term.structure.princomp = princomp(term.structure.diff)
factor.loadings = term.structure.princomp$loadings[,1:3]
legend.loadings = c("First principal component",
"Second principal component","Third principal component")
lwd=3,lty=1,xlab = "Term", ylab = "Factor loadings")
> summary(term.structure.princomp)
Importance of components:
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8
Standard deviation 0.2028719 0.1381839 0.06938957 0.05234510 0.03430404 0.022611518 0.016081738 0.013068448
Proportion of Variance 0.5862010 0.2719681 0.06857903 0.03902608 0.01676075 0.007282195 0.003683570 0.002432489
Cumulative Proportion 0.5862010 0.8581690 0.92674803 0.96577411 0.98253486 0.989817052 0.993500621 0.995933111
using Teramonagi's R code, When generating stress scenarios, the idea is to generate independently those factors (or components) and then to aggregate them (using the expression given above). With Principal Component Analysis, PCA, we get orthogonal components, while with Independent Component Analysis, ICA, we get independent components. And independence and null correlation can be rather different. We recently discussed that idea in a paper with Christophe Villa (available soon here).

Consider the following sample

I=(Y<.25)*(Y<3*X)*(Y>X/3) +
op <- par(mfrow = c(1, 2))
plot(FACT1[1:2000],FACT2[1:2000],main="Principal component analysis",col="black",cex=.2,xlab="",ylab="",xaxt="n",yaxt="n")
plot(PCA$scores,cex=.2,main="Principal component analysis",
The PCA obtain the following projections on the two components (drawn in red, below)
> X=PCA$scores[,1];
> Y=PCA$scores[,2];
> n=length(FACT1)
> x=X[sample(1:n,size=n,replace=TRUE)]
> y=Y[sample(1:n,size=n,replace=TRUE)]
> PCA$loadings
Comp.1 Comp.2
FACT1 -0.707 0.707
FACT2 -0.707 -0.707
Comp.1 Comp.2
SS loadings 1.0 1.0
Proportion Var 0.5 0.5
Cumulative Var 0.5 1.0
> F1=0.707*x-.707*y
> F2=0.707*x+.707*y

Hence, with PCA, we have two components, orthogonal, with a triangular distribution, so if we generate them independently, we obtain

which is quite different, compared with the original sample. On the other hand, with ICA,we obtain factors that are really independent....

plot(ICA$S,cex=.2,main="Independent component analysis",
see below for the graphs, and here for more details,

Friday, November 26 2010

Talk at McGill

I will give a talk at McGill university this afternoon, on "distorting probabilities in actuarial science". Note that Louis Paul Rivest will give a talk just after, but at UQAM, at the statistical seminar (here)

Tuesday, November 9 2010

Natural Catastrophe Insurance: When Should the Government Intervene?

Drought risk is something extremely difficult to insure, mainly because most of the assumptions we usually have in insurance are no longer valid. For instance, there is a strong correlation in space and time: if your neighbor claims a loss due to drought, so will you, probably; and if you claimed a loss last year, this year you are very likely to claim a loss, again. In France, droughts are covered by the cat nat mechanism. On the map below, we can see how many times drought insurance has been used, per village (red points means that 5 years (over the past 20) there has been a drought... well, at least 5 years, the maximum being 16),

The insurance works in France only because of the cat nat system. We can see on that picture that there are two different regions in France: the one in white, i.e. no drought over the past 20 years, and the one with small dots, i.e. droughts every 5 or 10 years. This heterogeneity might probably end the market, in the case the cat nat system does not work any longer. In several regions, no one will buy such insurance (why getting insurance if there is no risk ?), while in other, everyone will. And then, we might face some bankruptcies since risks become too correlated in those small regions... Anyway, if you're interested in droughts, you can look here, or there, or here if you want (in French) a description of the cat nat system.
This was the starting point of a paper we started a few years ago with Benoît, untitled "Natural Catastrophe Insurance: When Should the Government Intervene? ". The paper is now on-line, and can be downloaded from here.
The abstract is the following "The present research relaxes three of the usual assumptions made in the insurance literature. It is assumed that (1) there is a finite number of risks, (2) the risks are not statistically independent and (3) the structure of the market is monopolistic. In this context, the article analyses two models of natural catastrophe insurance: a model of insurance with limited liability and a model with unlimited guarantee. Among others, the results confirm the idea that the natural catastrophe insurance industry is characterized by economies of scale. The government should consequently encourage the emergence of a monopoly and discipline the industry through regulated premiums. It is also shown that government intervention of last resort is not needed when the risks are highly correlated. Lastly, the results point out that when the risks between two regions are not sufficiently independent, the pooling of the risks can lead to a Pareto improvement only if the regions face similar magnitude of damage. If not, then the region with low-damage events needs the premium to decrease to accept the pooling of the risks.".

Tuesday, November 2 2010

Lecture notes on risk and insurance

I just finished some lectures notes on risk and insurance The notes, that can be downloaded here, are in French, and will be used at the JES (Journées d'Etudes Statistiques), organised at the CIRM (mentioned here). Previous notes on risk measures (here) and copulas (there). Again, all comments are welcome...

Thursday, October 14 2010

Copules et risques corrélés
J'avais promis dans un commentaire que je mettrais bientôt en ligne un survey sur les copules.... après plusieurs mois de retard, le document est en ligne ici, et il est sobrement intitulé copules et risques multiples. Toutes les remarques et critiques sont les bienvenues ! Il s'agit d'un chapitre pour un livre dont je mettrais la référence en ligne ultérieurement. A priori, ce document devrait servir de base pour le cours qui sera donné dans un mois au CIRM (mentionné ici, dans le cadre des Journées d'Études Statistiques).

Wednesday, October 13 2010

Some references (with a bibtex file)

Since I recently got two requests, from two researchers who asked me how to inclued proper references of papers that can be downloaded from this blog, I can mention that there is a bibtex file here including most of my publications (since I do not mention them precisely on that blog). To be more specific, on the estimation of densities of copulas (mentioned here, with R codes).

Charpentier, A., Fermanian, O., and Scaillet, O. (2007). The Estimation of Copulas: Theory and Practice. in Copulas: From theory to application in finance, J. Rank Editor. Risk Books; 35-62.
and for catastrophe options,
Charpentier, A. (2008). Pricing catastrophe options in incomplete market. in Proceedings of the Actuarial and Financial Mathematics Conference, M.Vanmaele, G. Deelstra, A. De Schepper, J. Dhaene, H. Reynaerts,W. Schoutens & P. Van Goethem Editors, WK, 19-31.
That paper was mentioned there.

Saturday, October 2 2010

Modèles de bonus-malus en assurance automobile

Petit article sur la modélisation du système bonus-malus français dans le prochain numéro de la revue Risques (le numéro 83), intitulé "le bonus-malus français a-t-il encore un avenir ?" (en ligne ici). L'article est inspirée d'un mini-cours qu'avait donné Jean Lemaire il y a une dizaines d'années, alors qu'il était invité par AON à Hong Kong sur "markov chains in insurance". En particulier, il avait insisté sur la vitesse de convergence des systèmes bonus-malus, que j'ai repris en partie ici, sur le cas français.
Au niveau du code, pour étudier cette vitesse de convergence (au moins graphiquement), on peut reprendre le code suivant. On se donne une distribution initiale (ainsi que les niveaux de prime), et une matrice de transition

>U=c(1,0,0); P=c(121.8,.9*121.8,.81*121.8)
> M=matrix(c(p,p,0,1-p,0,p,0,1-p,1-p),3,3)

Ensuite, on garde la prime moyenne collectée, année après année, et la répartition dans les 3 classes,

>  MP=t(U)%*%P
>  MU=U
>  for(i in 1:10){
+  V=U%*%M
+  U=V
+  MU=rbind(MU,U)
+  MP=c(MP,U%*%P)}

Le code suivant (il n'est pas forcément optimal, mais il tourne) permet alors de faire un graphique montrant l'évolution par fréquences, ainsi que la prime moyenne collectée,

>  ylim <- c(0,100)
>  plot(0:1,0:1, col="white", type="p", ylim=ylim,
+ axes=FALSE, ylab="",xlab="Années", xlim=c(0,10))
> COULEUR=heat.colors(length(P))
> MU2=cbind(rep(0,nrow(MU)),MU[,length(P):1])
> MU3=t(apply(MU2,1,cumsum))
> for(i in 0:9){
+  for(j in 1:length(P)){
+  polygon(c(i-.4,i-.4,i+.4,i+.4),
+  100*c(MU3[i+1,j],MU3[i+1,j+1],MU3[i+1,j+1],
+ MU3[i+1,j]),col=COULEUR[j])
+ }}
>  axis(1);
>  ylim <- c(90,130)
>  axis(2)
>  par(new=TRUE)
>  plot(0:1,0:1, col="white", type="p",
+ ylim=ylim,axes=FALSE, ylab="",
+  xlab="",xlim=c(0,9))
>  axis(4)
> points(0:8,MP[1:9],type="b",
+ lwd=4,col="blue")
> abline(h=100,col="blue",lty=2)
>  mtext("Proportion dans les classes", 2, line=2)
>  mtext("     Prime moyenne", 4, line=-1.5 )

On peut alors adapter le code à n'importe quel système de bonus malus (à classes), comme ceux décrit dans le livre de Jean Lemaire (dont j'avais parlé ici). Par exemple pour
il suffit d'entrer les paramètres (je décris un peu mieux la matrice de transition, pour que les choses soient claires, en particulier il est souvent plus simple de rentrer la matrice par ligne, en calculant les probabilités d'arriver dans telle ou telle classe, en fonction de là où on est)
> U=c(1,0,0,0,0,0); P=c(100,80,70,60,50,40)
> p=.1
> M=t(matrix(
+ c(p,1-p,0,0,0,0,
+   p,0,1-p,0,0,0,
+   p,0,0,1-p,0,0,
+   p,0,0,0,1-p,0,
+   0,0,p,0,0,1-p,
+   0,0,0,p,0,1-p),
+ length(P),length(P)))

Sinon côté références, le livre de Jean Lemaire est en partie dans google books, ici, et pour l'article d'Edouard Franckx, il est . Pour aller plus loin, il y a le livre de Michel Denuit (et al.), , et sinon l'article de Magali Kelle (paru dans le BFA) qui parle du système bonus-malus français est en ligne ici.

Friday, March 5 2010

noise traders

Christophe gave a talk at EM Lyon at the end of February at the Journées de Finance Inter-Ecoles de Commerce 2010 (here), about "Category-Based Tail Comovement". I have uploaded Christophe's slides here. The abstract of the paper is the following, "Traditional financial theory predicts that comovement in asset returns is due to fundamentals. An alternative view is that of Barberis and Shleifer (2003) and Barberis, Shleifer and Wurgler (2005) who propose a sentiment based theory of comovement, delinking it from fundamentals. In their paper they view comovement under the prism of the standard Pearson’s correlation measure, implicitly excluding extreme market events, such as the latest financial crisis. Poon, Rockinger and Tawn (2004) have shown that under such events di¤erent types of comovement or dependence may co-exist, and make a clear distinction between the four types of dependence: perfect dependent, independent, asymptotically dependent and asymptotically independent. In this paper we extend the sentiment based theory of comovement so as to cover the whole spectrum of dependence, including extreme comovement such as the one that can be observed in …nancial crises. One of the key contributions of this paper is that it formally proves that assets belonging to the same category comove too much in the tail and reclassifying an asset into a new category raises its tail dependence with that category".

- page 1 of 3