# Freakonometrics

## Teaching › MAT8886 copulas and extremes

Thursday, September 27 2012

## Bounding sums of random variables, part 2

It is possible to go further, much more actually, on bounding sums of random variables (mentioned in the previous post). For instance, if everything has been defined, in that previous post, on distributions on $\mathbb{R}^+$, it is possible to extend bounds of distributions on $\mathbb{R}$. Especially if we deal with quantiles. Everything we've seen remain valid. Consider for instance two $\mathcal{N}(0,1)$ distributions. Using the previous code, it is possible to compute bounds for the quantiles of the sum of two Gaussian variates. And one has to remember that those bounds are sharp.

> Finv=function(u) qnorm(u,0,1)
> Ginv=function(u) qnorm(u,0,1)
> n=1000
> Qinf=Qsup=rep(NA,n-1)
> for(i in 1:(n-1)){
+ J=0:i
+ Qinf[i]=max(Finv(J/n)+Ginv((i-J)/n))
+ J=(i-1):(n-1)
+ Qsup[i]=min(Finv((J+1)/n)+Ginv((i-1-J+n)/n))
+ }

Actually, it is possible to compare here with two simple cases: the independent case, where the sum has a $\mathcal{N}(0,2)$ distribution, and the comonotonic case where the sum has a $\mathcal{N}(0,2^2)$ distribution.

>  lines(x,qnorm(x,sd=sqrt(2)),col="blue",lty=2)
>  lines(x,qnorm(x,sd=2),col="blue",lwd=2)

On the graph below, the comonotonic case (usually considered as the worst case scenario) is the plain blue line (with here an animation to illustrate the convergence of the numerical algorithm)

Below that (strong) blue line, then risks are sub-additive for the Value-at-Risk, i.e.

$H^{-1}(u)\leq F^{-1}(u)+G^{-1}(u)$

but above, risks are super-additive for the Value-at-RIsk. i.e.

$H^{-1}(u)\geq F^{-1}(u)+G^{-1}(u)$

(since for comonotonic variates, the quantile of the sum is the sum of quantiles). It is possible to visualize those two cases above, in green the area where risks are super-additive, while the yellow area is where risks are sub-additive.

Recall that with a Gaussian random vector, with correlation $r$ then the quantile is the quantile of a random variable centered, with variance $2(1+r)$. Thus, on the graph below, we can visualize case that can be obtained with this Gaussian copula. Here the yellow area can be obtained with a Gaussian copulas, the upper and the lower bounds being respectively the comonotonic and the countermononic cases.

But the green area can also be obtained when we sum two Gaussian variables ! We just have to go outside the Gaussian world, and consider another copula.

Another point is that, in the previous post, $C^-$ was the lower Fréchet-Hoeffding bound on the set of copulas. But all the previous results remain valid if $C^-$ is a lower bound on the set of copulas of interest. Especially

$\tau_{C^-,L}(F,G)\leq \sigma_{C,L}(F,G)\leq\rho_{C^-,L}(F,G)$

for all $C$ such that $C\geq C^-$. For instance, if we assume that the copula should have positive dependence, i.e. $C\geq C^\perp$, then

$\tau_{C^\perp,L}(F,G)\leq \sigma_{C,L}(F,G)\leq\rho_{C^\perp,L}(F,G)$

Which means we should have sharper bounds. Numerically, it is possible to compute those sharper bounds for quantiles. The lower bound becomes

$\sup_{u\in[x,1]}\left\{F^{-1}(u)+G^{-1}\left(\frac{x}{u}\right)\right\}$

while the upper bound is

$\sup_{u\in[0,x]}\left\{F^{-1}(u)+G^{-1}\left(\frac{x-u}{1-u}\right)\right\}$

Again, one can easily compute those quantities on a grid of the unit interval,

> Qinfind=Qsupind=rep(NA,n-1)
> for(i in 1:(n-1)){
+  J=1:(i)
+  Qinfind[i]=max(Finv(J/n)+Ginv((i-J)/n/(1-J/n)))
+  J=(i):(n-1)
+  Qsupind[i]=min(Finv(J/n)+Ginv(i/J))
+ }

We get the graph below (the blue area is here to illustrate how sharper those bounds get with the assumption that we do have positive dependence, this area been attained only with copulas exhibiting non-positive dependence)

For high quantiles, the upper bound is rather close to the one we had before, since worst case are probably obtained when we do have positive correlation. But it will strongly impact the lower bound. For instance, it becomes now impossible to have a negative quantile, when the probability exceeds 75% if we do have positive dependence...

> Qinfind[u==.75]
[1] 0

## Bounding sums of random variables, part 1

For the last course MAT8886 of this (long) winter session, on copulas (and extremes), we will discuss risk aggregation. The course will be mainly on the problem of bounding  the distribution (or some risk measure, say the Value-at-Risk) for two random variables with given marginal distribution. For instance, we have two Gaussian risks. What could be be worst-case scenario for the 99% quantile of the sum ? Note that I mention implications in terms of risk management, but of course, those questions are extremely important in terms of statistical inference, see e.g. Fan & Park (2006).

This problem, is sometimes related to some question asked by Kolmogorov almost one hundred years ago, as mentioned in Makarov (1981). One year after, Rüschendorf (1982) also suggested a proof of bounds calculation. Here, we focus in dimension 2. As usual, it is the simple case. But as mentioned recently, in Kreinovich & Ferson (2005), in dimension 3 (or higher), "computing the best-possible bounds for arbitrary n is an NP-hard (computationally intractable) problem". So let us focus on the case where we sum (only) two random variable (for those interested in higher dimension, Puccetti & Rüschendorf (2012) provided interesting results for a dual version of those optimal bounds).

Let $\Delta$ denote the set of univariate continuous distribution function, left-continuous, on $\mathbb{R}$. And $\Delta^+$ the set of distributions on $\mathbb{R}^+$. Thus, $F\in\Delta^+$ if $F\in\Delta$ and $F(0)=0$. Consider now two distributions $F,G\in\Delta^+$. In a very general setting, it is possible to consider operators on $\Delta^+\times \Delta^+$. Thus, let $T:[0,1]\times[0,1]\rightarrow[0,1]$ denote an operator, increasing in each component, thus that $T(1,1)=1$. And consider some function $L:\mathbb{R}^+\times\mathbb{R}^+\rightarrow\mathbb{R}^+$ assumed to be also increasing in each component (and continuous). For such functions $T$ and $L$, define the following (general) operator, $\tau_{T,L}(F,G)$ as

$\tau_{T,L}(F,G)(x)=\sup_{L(u,v)=x}\{T(F(u),G(v))\}$

One interesting case can be obtained when $T$is a copula, $C$. In that case,

$\tau_{C,L}(F,G):\Delta^+\times\Delta^+\rightarrow\Delta^+$

and further, it is possible to write

$\tau_{C,L}(F,G)(x)=\sup_{(u,v)\in L^{-1}(x)}\{C(F(u),G(v))\}$

It is also possible to consider other (general) operators, e.g. based on the sum

$\sigma_{C,L}(F,G)(x)=\int_{(u,v)\in L^{-1}(x)} dC(F(u),G(v))$

or on the minimum,

$\rho_{C,L}(F,G)(x)=\inf_{(u,v)\in L^{-1}(x)}\{C^\star(F(u),G(v))\}$

where $C^\star$ is the survival copula associated with $C$, i.e. $C^\star(u,v)=u+v-C(u,v)$. Note that those operators can be used to define distribution functions, i.e.

$\sigma_{C,L}(F,G):\Delta^+\times\Delta^+\rightarrow\Delta^+$

and similarly

$\rho_{C,L}(F,G):\Delta^+\times\Delta^+\rightarrow\Delta^+$

All that seems too theoretical ? An application can be the case of the sum, i.e. $L(x,y)=x+y$, in that case $\sigma_{C,+}(F,G)$ is the distribution of sum of two random variables with marginal distributions $F$ and $G$, and copula $C$. Thus, $\sigma_{C^\perp,+}(F,G)$ is simply the convolution of two distributions,

$\sigma_{C^\perp,+}(F,G)(x)=\int_{u+v=x} dC^\perp(F(u),G(v))$

The important result (that can be found in Chapter 7, in Schweizer and Sklar (1983)) is that given an operator $L$, then, for any copula $C$, one can find a lower bound for $\sigma_{C,L}(F,G)$

$\tau_{C^-,L}(F,G)\leq \tau_{C,L}(F,G)\leq\sigma_{C,L}(F,G)$

as well as an upper bound

$\sigma_{C,L}(F,G)\leq \rho_{C,L}(F,G)\leq\rho_{C^-,L}(F,G)$

Those inequalities come from the fact that for all copula $C$, $C\geq C^-$, where $C^-$ is a copula. Since this function is not copula in higher dimension, one can easily imagine that get those bounds in higher dimension will be much more complicated...

In the case of the sum of two random variables, with marginal distributions $F$ and $G$, bounds for the distribution of the sum $H(x)=\mathbb{P}(X+Y\leq x)$, where $X\sim F$ and $Y\sim G$, can be written

$H^-(x)=\tau_{C^- ,+}(F,G)(x)=\sup_{u+v=x}\{ \max\{F(u)+G(v)-1,0\} \}$

for the lower bound, and

$H^+(x)=\rho_{C^- ,+}(F,G)(x)=\inf_{u+v=x}\{ \min\{F(u)+G(v),1\} \}$

for the upper bound. And those bounds are sharp, in the sense that, for all $t\in(0,1)$, there is a copula $C_t$ such that

$\tau_{C_t,+}(F,G)(x)=\tau_{C^- ,+}(F,G)(x)=t$

and there is (another) copula $C_t$ such that

$\sigma_{C_t,+}(F,G)(x)=\tau_{C^- ,+}(F,G)(x)=t$

Thus, using those results, it is possible to bound cumulative distribution function. But actually, all that can be done also on quantiles (see Frank, Nelsen & Schweizer (1987)). For all $F\in\Delta^+$ let $F^{-1}$ denotes its generalized inverse, left continuous, and let $\nabla^+$ denote the set of those quantile functions. Define then the dual versions of our operators,

$\tau^{-1}_{T,L}(F^{-1},G^{-1})(x)=\inf_{(u,v)\in T^{-1}(x)}\{L(F^{-1}(u),G^{-1}(v))\}$

and

$\rho^{-1}_{T,L}(F^{-1},G^{-1})(x)=\sup_{(u,v)\in T^\star^{-1}(x)}\{L(F^{-1}(u),G^{-1}(v))\}$

Those definitions are really dual versions of the previous ones, in the sense that $\tau^{-1}_{T,L}(F^{-1},G^{-1})=[\tau_{T,L}(F,G)]^{-1}$ and $\rho^{-1}_{T,L}(F^{-1},G^{-1})=[\rho_{T,L}(F,G)]^{-1}$.

Note that if we focus on sums of bivariate distributions, the lower bound for the quantile of the sum is

$\tau^{-1}_{C^{-},+}(F^{-1},G^{-1})(x)=\inf_{\max\{u+v-1,0\}=x}\{F^{-1}(u)+G^{-1}(v)\}$

while the upper bound is

$\rho^{-1}_{C^{-},+}(F^{-1},G^{-1})(x)=\sup_{\min\{u+v,1\}=x}\{F^{-1}(u)+G^{-1}(v)\}$

A great thing is that it should not be too difficult to compute numerically those quantities. Perhaps a little bit more for cumulative distribution functions, since they are not defined on a bounded support. But still, if the goal is to plot those bounds on $[0,10]$, for instance. The code is the following, for the sum of two lognormal distributions $LN(0,1)$

> F=function(x) plnorm(x,0,1)
> G=function(x) plnorm(x,0,1)
> n=100
> X=seq(0,10,by=.05)
> Hinf=Hsup=rep(NA,length(X))
> for(i in 1:length(X)){
+ x=X[i]
+ U=seq(0,x,by=1/n); V=x-U
+ Hinf[i]=max(pmax(F(U)+G(V)-1,0))
+ Hsup[i]=min(pmin(F(U)+G(V),1))}

If we plot those bounds, we obtain

> plot(X,Hinf,ylim=c(0,1),type="s",col="red")
> lines(X,Hsup,type="s",col="red")

But somehow, it is even more simple to work with quantiles since they are defined on a finite support. Quantiles are here

> Finv=function(u) qlnorm(u,0,1)
> Ginv=function(u) qlnorm(u,0,1)

The idea will be to consider a discretized version of the unit interval as discussed in Williamson (1989), in a much more general setting. Again the idea is to compute, for instance

$\sup_{u\in[0,x]}\{F^{-1}(u)+G^{-1}(x-u)\}$

The idea is to consider $x=i/n$ and $u=j/n$, and the bound for the quantile function at point $i/n$ is then

$\sup_{j\in\{0,1,\cdots,i\}}\left\{F^{-1}\left(\frac{j}{n}\right)+G^{-1}\left(\frac{i-j}{n}\right)\right\}$

The code to compute those bounds, for a given $n$ is here

> n=1000
> Qinf=Qsup=rep(NA,n-1)
> for(i in 1:(n-1)){
+ J=0:i
+ Qinf[i]=max(Finv(J/n)+Ginv((i-J)/n))
+ J=(i-1):(n-1)
+ Qsup[i]=min(Finv((J+1)/n)+Ginv((i-1-J+n)/n))
+ }

Here we have (several $n$s were considered, so that we can visualize the convergence of that numerical algorithm),

Here, we have a simple code to visualize bounds for quantiles for the sum of two risks. But it is possible to go further...

Sunday, September 23 2012

## Maximum likelihood estimates for multivariate distributions

Consider our loss-ALAE dataset, and - as in Frees & Valdez (1998) - let us fit a parametric model, in order to price a reinsurance treaty. The dataset is the following,

> library(evd)
> data(lossalae)
> Z=lossalae
> X=Z[,1];Y=Z[,2]

The first step can be to estimate marginal distributions, independently. Here, we consider lognormal distributions for both components,

> Fempx=function(x) mean(X<=x)
> Fx=Vectorize(Fempx)
> u=exp(seq(2,15,by=.05))
> plot(u,Fx(u),log="x",type="l",
+ xlab="loss (log scale)")
> Lx=function(px) -sum(log(Vectorize(dlnorm)(
+ X,px[1],px[2])))
> opx=optim(c(1,5),fn=Lx)
> opx$par [1] 9.373679 1.637499 > lines(u,Vectorize(plnorm)(u,opx$par[1],
+ opx$par[2]),col="red") The fit here is quite good, For the second component, we do the same, > Fempy=function(x) mean(Y<=x) > Fy=Vectorize(Fempy) > u=exp(seq(2,15,by=.05)) > plot(u,Fy(u),log="x",type="l", + xlab="ALAE (log scale)") > Ly=function(px) -sum(log(Vectorize(dlnorm)( + Y,px[1],px[2]))) > opy=optim(c(1.5,10),fn=Ly) > opy$par
[1] 8.522452 1.429645
> lines(u,Vectorize(plnorm)(u,opy$par[1], + opy$par[2]),col="blue")

It is not as good as the fit obtained on losses, but it is not that bad,

Now, consider a multivariate model, with Gumbel copula. We've seen before that it worked well. But this time, consider the maximum likelihood estimator globally.

> Cop=function(u,v,a) exp(-((-log(u))^a+
+ (-log(v))^a)^(1/a))
> phi=function(t,a) (-log(t))^a
> cop=function(u,v,a) Cop(u,v,a)*(phi(u,a)+
+ phi(v,a))^(1/a-2)*(
+ a-1+(phi(u,a)+phi(v,a))^(1/a))*(phi(u,a-1)*
+ phi(v,a-1))/(u*v)
> L=function(p) {-sum(log(Vectorize(dlnorm)(
+ X,p[1],p[2])))-
+ sum(log(Vectorize(dlnorm)(Y,p[3],p[4])))-
+ sum(log(Vectorize(cop)(plnorm(X,p[1],p[2]),
+ plnorm(Y,p[3],p[4]),p[5])))}
> opz=optim(c(1.5,10,1.5,10,1.5),fn=L)
> opz$par [1] 9.377219 1.671410 8.524221 1.428552 1.468238 Marginal parameters are (slightly) different from the one obtained independently, > c(opx$par,opy$par) [1] 9.373679 1.637499 8.522452 1.429645 > opz$par[1:4]
[1] 9.377219 1.671410 8.524221 1.428552

And the parameter of Gumbel copula is close to the one obtained with heuristic methods in class.

Now that we have a model, let us play with it, to price a reinsurance treaty. But first, let us see how to generate Gumbel copula... One idea can be to use the frailty approach, based on a stable frailty. And we can use Chambers et al (1976) to generate a stable distribution. So here is the algorithm to generate samples from Gumbel copula

> alpha=opz$par[5] > invphi=function(t,a) exp(-t^(1/a)) > n=500 > x=matrix(rexp(2*n),n,2) > angle=runif(n,0,pi) > E=rexp(n) > beta=1/alpha > stable=sin((1-beta)*angle)^((1-beta)/beta)* + (sin(beta*angle))/(sin(angle))^(1/beta)/ + (E^(alpha-1)) > U=invphi(x/stable,alpha) > plot(U) Here, we consider only 500 simulations, Based on that copula simulation, we can then use marginal transformations to generate a pair, losses and allocated expenses, > Xloss=qlnorm(U[,1],opz$par[1],opz$par[2]) > Xalae=qlnorm(U[,2],opz$par[3],opz$par[4]) In standard reinsurance treaties - see e.g. Clarke (1996) - allocated expenses are splited prorata capita between the insurance company, and the reinsurer. If $X$ denotes losses, and $Y$ the allocated expenses, a standard excess treaty can be has payoff $\begin{cases} 0\cdot \boldsymbol{1}(X\leq R)\\ \displaystyle{\left(X-R+ \frac{X-R}{X}\cdot Y\right) \right )}\cdot\boldsymbol{1}(R where $L$ denotes the (upper) limit, and $R$ the insurer's retention. Using monte carlo simulation, it is then possible to estimate the pure premium of such a reinsurance treaty. > L=100000 > R=50000 > Z=((Xloss-R)+(Xloss-R)/Xloss*Xalae)* + (R<=Xloss)*(Xloss<L)+ + ((L-R)+(L-R)/R*Xalae)*(L<=Xloss) > mean(Z) [1] 12596.45 Now, play with it... it is possible to find a better fit, I guess... Thursday, September 20 2012 ## Interactive 3d plot, in R Following the course of this afternoon, I will just upload some codes to make interactive 3d plots, in R. > library(rgl) > library(evd); > data(lossalae) > U=rank(lossalae[,1]+rnorm(nrow(lossalae), + mean=0,sd=.001))/(nrow(lossalae)+1) > V=rank(lossalae[,2])/(nrow(lossalae)+1) > M=kde2d(qnorm(U),qnorm(V),n=35) > library(rgl) > persp3d(M$x,M$y,M$z,col='green',
+ xlab="loss",ylab="alae",zlab="")


## (nonparametric) Copula density estimation

Today, we will go further on the inference of copula functions. Some codes (and references) can be found on a previous post, on nonparametric estimators of copula densities (among other related things).  Consider (as before) the loss-ALAE dataset (since we've been working a lot on that dataset)

> library(MASS)
> library(evd)
> X=lossalae
> U=cbind(rank(X[,1])/(nrow(X)+1),rank(X[,2])/(nrow(X)+1))

The standard tool to plot nonparametric estimators of densities is to use multivariate kernels. We can look at the density using

> mat1=kde2d(U[,1],U[,2],n=35)
> persp(mat1$x,mat1$y,mat1$z,col="green", + shade=TRUE,theta=s*5, + xlab="",ylab="",zlab="",zlim=c(0,7)) or level curves (isodensity curves) with more detailed estimators (on grids with shorter steps) > mat1=kde2d(U[,1],U[,2],n=101) > image(mat1$x,mat1$y,mat1$z,col=
+ rev(heat.colors(100)),xlab="",ylab="")
> contour(mat1$x,mat1$y,mat1$z,add= + TRUE,levels = pretty(c(0,4), 11)) Kernels are nice, but we clearly observe some border bias, extremely strong in corners (the estimator is 1/4th of what it should be, see another post for more details). Instead of working on sample $(U_i,V_i)$ on the unit square, consider some transformed sample $(Q(U_i),Q(V_i))$, where $Q:(0,1)\rightarrow\mathbb{R}$ is a given function. E.g. a quantile function of an unbounded distribution, for instance the quantile function of the $\mathcal{N}(0,1)$ distribution. Then, we can estimate the density of the transformed sample, and using the inversion technique, derive an estimator of the density of the initial sample. Since the inverse of a (general) function is not that simple to compute, the code might be a bit slow. But it does work, > gaussian.kernel.copula.surface <- function (u,v,n) { + s=seq(1/(n+1), length=n, by=1/(n+1)) + mat=matrix(NA,nrow = n, ncol = n) + sur=kde2d(qnorm(u),qnorm(v),n=1000, + lims = c(-4, 4, -4, 4)) + su<-sur$z
+ for (i in 1:n) {
+     for (j in 1:n) {
+ 	Xi<-round((qnorm(s[i])+4)*1000/8)+1;
+ 	Yj<-round((qnorm(s[j])+4)*1000/8)+1
+ 	mat[i,j]<-su[Xi,Yj]/(dnorm(qnorm(s[i]))*
+ 	dnorm(qnorm(s[j])))
+     }
+ }
+ return(list(x=s,y=s,z=data.matrix(mat)))
+ }

Here, we get

Note that it is possible to consider another transformation, e.g. the quantile function of a Student-t distribution.

> student.kernel.copula.surface =
+  function (u,v,n,d=4) {
+  s <- seq(1/(n+1), length=n, by=1/(n+1))
+  mat <- matrix(NA,nrow = n, ncol = n)
+ sur<-kde2d(qt(u,df=d),qt(v,df=d),n=5000,
+ lims = c(-8, 8, -8, 8))
+ su<-sur$z + for (i in 1:n) { + for (j in 1:n) { + Xi<-round((qt(s[i],df=d)+8)*5000/16)+1; + Yj<-round((qt(s[j],df=d)+8)*5000/16)+1 + mat[i,j]<-su[Xi,Yj]/(dt(qt(s[i],df=d),df=d)* + dt(qt(s[j],df=d),df=d)) + } + } + return(list(x=s,y=s,z=data.matrix(mat))) + } Another strategy is to consider kernel that have precisely the unit interval as support. The idea is here to consider the product of Beta kernels, where parameters depend on the location > beta.kernel.copula.surface= + function (u,v,bx=.025,by=.025,n) { + s <- seq(1/(n+1), length=n, by=1/(n+1)) + mat <- matrix(0,nrow = n, ncol = n) + for (i in 1:n) { + a <- s[i] + for (j in 1:n) { + b <- s[j] + mat[i,j] <- sum(dbeta(a,u/bx,(1-u)/bx) * + dbeta(b,v/by,(1-v)/by)) / length(u) + } + } + return(list(x=s,y=s,z=data.matrix(mat))) + } On those two graphs, we can clearly observe strong tail dependence in the upper (right) corner, that cannot be intuited using a standard kernel estimator... Tuesday, September 18 2012 ## Copulas and tail dependence, part 3 We have seen extreme value copulas in the section where we did consider general families of copulas. In the bivariate case, an extreme value can be written where $A(\cdot)$ is Pickands dependence function, which is a convex function satisfying Observe that in this case, where is Kendall'tau, and can be written For instance, if then, we obtain Gumbel copula. This is what we've seen in the section where we introduced this family. Now, let us talk about (nonparametric) inference, and more precisely the estimation of the dependence function. The starting point of the most standard estimator is to observe that if has copula $C$, then has distribution function And conversely, Pickands dependence function can be written Thus, a natural estimator for Pickands function is where $\widehat{H}_n$ is the empirical cumulative distribution function of This is the estimator proposed in Capéràa, Fougères & Genest (1997). Here, we can compute everything here using > library(evd) > X=lossalae > U=cbind(rank(X[,1])/(nrow(X)+1),rank(X[,2])/(nrow(X)+1)) > Z=log(U[,1])/log(U[,1]*U[,2]) > h=function(t) mean(Z<=t) > H=Vectorize(h) > a=function(t){ + f=function(t) (H(t)-t)/(t*(1-t)) + return(exp(integrate(f,lower=0,upper=t, + subdivisions=10000)$value))
+ }
> A=Vectorize(a)
> u=seq(.01,.99,by=.01)
> plot(c(0,u,1),c(1,A(u),1),type="l",col="red",
+ ylim=c(.5,1))
Even integrate to get an estimator of Pickands' dependence function. Note that an interesting point is that the upper tail dependence index can be visualized on the graph, above,

> A(.5)/2
[1] 0.4055346

## Copulas and tail dependence, part 2

An alternative to describe tail dependence can be found in the Ledford & Tawn (1996) for instance. The intuition behind can be found in Fischer & Klein (2007)). Assume that and have the same distribution. Now, if we assume that those variables are (strictly) independent,

But if we assume that those variables are (strictly) comonotonic (i.e. equal here since they have the same distribution), then
So assume that there is a such that
Then =2 can be interpreted as independence while =1 means strong (perfect) positive dependence. Thus, consider the following transformation to get a parameter in [0,1], with a strength of dependence increasing with the index, e.g.
In order to derive a tail dependence index, assume that there exists a limit to
which will be interpreted as a (weak) tail dependence index. Thus define concentration functions

for the lower tail (on the left) and

for the upper tail (on the right). The R code to compute those functions is quite simple,
> library(evd);
> data(lossalae)
> X=lossalae
> U=rank(X[,1])/(nrow(X)+1)
> V=rank(X[,2])/(nrow(X)+1
> fL2emp=function(z) 2*log(mean(U<=z))/
+ log(mean((U<=z)&(V<=z)))-1
> fR2emp=function(z) 2*log(mean(U>=1-z))/
+ log(mean((U>=1-z)&(V>=1-z)))-1
> u=seq(.001,.5,by=.001)
> L=Vectorize(fL2emp)(u)
> R=Vectorize(fR2emp)(rev(u))
> plot(c(u,u+.5-u[1]),c(L,R),type="l",ylim=0:1,
+ xlab="LOWER TAIL      UPPER TAIL")
> abline(v=.5,col="grey")
and again, it is possible to plot those empirical functions against some parametric ones, e.g. the one obtained from a Gaussian copula (with the same Kendall's tau)
> tau=cor(lossalae,method="kendall")[1,2]
> library(copula)
> paramgauss=sin(tau*pi/2)
> copgauss=normalCopula(paramgauss)
> Lgaussian=function(z) 2*log(z)/log(pCopula(c(z,z),
+ copgauss))-1
> Rgaussian=function(z) 2*log(1-z)/log(1-2*z+
+ pCopula(c(z,z),copgauss))-1
> u=seq(.001,.5,by=.001)
> Lgs=Vectorize(Lgaussian)(u)
> Rgs=Vectorize(Rgaussian)(1-rev(u))
> lines(c(u,u+.5-u[1]),c(Lgs,Rgs),col="red")

or Gumbel copula,
> paramgumbel=1/(1-tau)
> copgumbel=gumbelCopula(paramgumbel, dim = 2)
> Lgumbel=function(z) 2*log(z)/log(pCopula(c(z,z),
+ copgumbel))-1
> Rgumbel=function(z) 2*log(1-z)/log(1-2*z+
+ pCopula(c(z,z),copgumbel))-1
> Lgl=Vectorize(Lgumbel)(u)
> Rgl=Vectorize(Rgumbel)(1-rev(u))
> lines(c(u,u+.5-u[1]),c(Lgl,Rgl),col="blue")

Again, one should look more carefully at confidence bands, but is looks like Gumbel copula provides a good fit here.

Monday, September 17 2012

## Copulas and tail dependence, part 1

As mentioned in the course last week Venter (2003) suggested nice functions to illustrate tail dependence (see also some slides used in Berlin a few years ago).
• Joe (1990)'s lambda
Joe (1990) suggested a (strong) tail dependence index. For lower tails, for instance, consider
i.e

• Upper and lower strong tail (empirical) dependence functions

The idea is to plot the function above, in order to visualize limiting behavior. Define

for the lower tail, and
for the upper tail, where is the survival copula associated with , in the sense that
while

Now, one can easily derive empirical conterparts of those function, i.e.

and
Thus, for upper tail, on the right, we have the following graph

and for the lower tail, on the left, we have

For the code, consider some real data, like the loss-ALAE dataset.

> library(evd)
> X=lossalae

The idea is to plot, on the left, the lower tail concentration function, and on the right, the upper tail function.

> U=rank(X[,1])/(nrow(X)+1)
> V=rank(X[,2])/(nrow(X)+1)
> Lemp=function(z) sum((U<=z)&(V<=z))/sum(U<=z)
> Remp=function(z) sum((U>=1-z)&(V>=1-z))/sum(U>=1-z)
> u=seq(.001,.5,by=.001)
> L=Vectorize(Lemp)(u)
> R=Vectorize(Remp)(rev(u))
> plot(c(u,u+.5-u[1]),c(L,R),type="l",ylim=0:1,
+ xlab="LOWER TAIL          UPPER TAIL")
> abline(v=.5,col="grey")

Now, we can compare this graph, with what should be obtained for some parametric copulas that have the same Kendall's tau (e.g.). For instance, if we consider a Gaussian copula,

> tau=cor(lossalae,method="kendall")[1,2]
> library(copula)
> paramgauss=sin(tau*pi/2)
> copgauss=normalCopula(paramgauss)
> Lgaussian=function(z) pCopula(c(z,z),copgauss)/z
> Rgaussian=function(z) (1-2*z+pCopula(c(z,z),copgauss))/(1-z)
> u=seq(.001,.5,by=.001)
> Lgs=Vectorize(Lgaussian)(u)
> Rgs=Vectorize(Rgaussian)(1-rev(u))> lines(c(u,u+.5-u[1]),c(Lgs,Rgs),col="red")

or Gumbel's copula,

> paramgumbel=1/(1-tau)
> copgumbel=gumbelCopula(paramgumbel, dim = 2)
> Lgumbel=function(z) pCopula(c(z,z),copgumbel)/z
> Rgumbel=function(z) (1-2*z+pCopula(c(z,z),copgumbel))/(1-z)
> u=seq(.001,.5,by=.001)
> Lgl=Vectorize(Lgumbel)(u)
> Rgl=Vectorize(Rgumbel)(1-rev(u))
> lines(c(u,u+.5-u[1]),c(Lgl,Rgl),col="blue")

That's nice (isn't it?), but since we do not have any confidence interval, it is still hard to conclude (even if it looks like Gumbel copula has a much better fit than the Gaussian one). A strategy can be to generate samples from those copulas, and to visualize what we had. With a Gaussian copula, the graph looks like
> u=seq(.0025,.5,by=.0025); nu=length(u)
> nsimul=500
> MGS=matrix(NA,nsimul,2*nu)
> for(s in 1:nsimul){
+ Xs=rCopula(nrow(X),copgauss)
+ Us=rank(Xs[,1])/(nrow(Xs)+1)
+ Vs=rank(Xs[,2])/(nrow(Xs)+1)
+ Lemp=function(z) sum((Us<=z)&(Vs<=z))/sum(Us<=z)
+ Remp=function(z) sum((Us>=1-z)&(Vs>=1-z))/sum(Us>=1-z)
+ MGS[s,1:nu]=Vectorize(Lemp)(u)
+ MGS[s,(nu+1):(2*nu)]=Vectorize(Remp)(rev(u))
+ lines(c(u,u+.5-u[1]),MGS[s,],col="red")
+ }

(including - pointwise - 90% confidence bands)

> Q95=function(x) quantile(x,.95)
> V95=apply(MGS,2,Q95)
> lines(c(u,u+.5-u[1]),V95,col="red",lwd=2)
> Q05=function(x) quantile(x,.05)
> V05=apply(MGS,2,Q05)
> lines(c(u,u+.5-u[1]),V05,col="red",lwd=2)

while it is

with Gumbel copula. Isn't it a nice (graphical) tool ?

But as mentioned in the course, the statistical convergence can be slow. Extremely slow. So assessing if the underlying copula has tail dependence, or not, it now that simple. Especially if the copula exhibits tail independence. Like the Gaussian copula. Consider a sample of size 1,000. This is what we obtain if we generate random scenarios,

or we look at the left tail (with a log-scale)

Now, consider a 10,000 sample,

or with a log-scale

We can even consider a 100,000 sample,

or with a log-scale

On those graphs, it is rather difficult to conclude if the limit is 0, or some strictly positive value (again, it is a classical statistical problem when the value of interest is at the border of the support of the parameter). So, a natural idea is to consider a weaker tail dependence index. Unless you have something like 100,000 observations...

Thursday, September 13 2012

## Copulas estimation, and influence of margins

Just a short post to get back on results mentioned at the end of the course. Since copulas are obtained using (univariate) quantile functions in the joint cumulative distribution function, they are - somehow - related to the marginal distribution fitted. In order to illustrate this point, consider an i.i.d. sample from a Student-t distribution,
library(mnormt)
r=.5
n=200
X=rmt(n,mean=c(0,0),S=matrix(c(1,r,r,1),2,2),df=4)
Thus, the true copula is Student-t. Here, with 4 degrees of freedom. Note that we can easily get the (true) value of the copula, on the diagonal
dg=function(t) pmt(qt(t,df=4),mean=c(0,0),
S=matrix(c(1,r,r,1),2,2),df=4)
DG=Vectorize(dg)
Four strategies are considered here to define pseudo-copula base variates,
• misfit: consider an invalid marginal estimation: we have assumed that margins were Gaussian, i.e.
• perfect fit: here, we know that margins were Student-t, with 4 degrees of freedom
• standard fit: then, consider the case where we fit marginal distribution, but in the good family this time (e.g. among Student-t distributions),
• ranks: finally, we consider nonparametric estimators for marginal distributions,
Now that we have a sample with margins in the unit square, let us construct the empirical copula,

Let us now compare those four approaches.
• The first one is to illustrate model error, i.e. what's going on if we fit distributions, but not in the proper family of parametric distributions.
X0=cbind((X[,1]-mean(X[,1])/sd(X[,1])),
(X[,2]-mean(X[,2])/sd(X[,2])))
Y=pnorm(X0)


Then, the following code is used to compute the value of the empirical copula, on the diagonal,

diagonale=function(t,Z) mean((Z[,1]<=t)&(Z[,2]<=t))
diagY=function(t) diagonale(t,Y)
DiagY=Vectorize(diagY)
u=seq(0,1,by=.005)
dY=DiagY(u)


On the graph below, 1,000 samples of size 200 have been generated. All trajectories are the estimation of the copula on the diagonal. The black plain line is the true value of the copula

Obviously, it is not good at all. Mainly because the distribution of can't be a copula, since margins are not even uniform on the unit interval.

• a perfect fit. Here, we use the following code to generate our copula-type sample
U=pt(X,df=4)

This time, the fit is much better.

• Using maximum likelihood estimators to fit the best distribution within the Student-t family
F1=fitdistr(X0[,1],dt,list(df=5),lower = 0.001)
F2=fitdistr(X0[,2],dt,list(df=5),lower = 0.001)
V=cbind(pt(X0[,1],df=F1$estimate),pt(X0[,2],df=F2$estimate))

Here, it is also very good. Even better than before, when the true distribution is considered.

(it is like using Lillie test for goodness of fit, versus Kolmogorov-Smirnov, see here for instance, in French).

• Finally, let us consider ranks, or nonparametric estimators for marginal distributions,
R=cbind(rank(X[,1])/(n+1),rank(X[,2])/(n+1))

Here it is even better then the previous one

If we compare Box-plots of the value of the copula at point (.2,.2), we obtain the following, with on top ranks, then fitting with the good family, then using the true distribution, and finally, using a non-proper distribution.

Just to illustrate one more time a result mentioned in a previous post, "in statistics, having too much information might not be a good thing".

Wednesday, September 12 2012

## Kendall's function for copulas

As mentioned in the course on copulas, a nice tool to describe dependence it Kendall's cumulative function. Given a random pair with distribution  , define random variable . Then Kendall's cumulative function is

Genest and Rivest (1993) introduced that function to choose among Archimedean copulas (we'll get back to this point below).

From a computational point of view, computing such a function can be done as follows,

• for all , compute as the proportion of observation in the lower quadrant, with upper corner , i.e.

• then compute the cumulative distribution function of 's.

To visualize the construction of that cumulative distribution function, consider the following animation

Thus, here the code to compute simply that cumulative distribution function is

n=nrow(X)
i=rep(1:n,each=n)
j=rep(1:n,n)
S=((X[i,1]>X[j,1])&(X[i,2]>X[j,2]))
Z=tapply(S,i,sum)/(n-1)

The graph can be obtain either using

plot(ecdf(Z))

or

plot(sort(Z),(1:n)/n,type="s",col="red")

The interesting point is that for an Archimedean copula with generator , then Kendall's function is simply

If we're too lazy to do the maths, at least, it is possible to compute those functions numerically. For instance, for Clayton copula,

h=.001
phi=function(t){(t^(-alpha)-1)}
dphi=function(t){(phi(t+h)-phi(t-h))/2/h}
k=function(t){t-phi(t)/dphi(t)}
Kc=Vectorize(k)

Similarly, let us consider Gumbel copula,

phi=function(t){(-log(t))^(theta)}
dphi=function(t){(phi(t+h)-phi(t-h))/2/h}
k=function(t){t-phi(t)/dphi(t)}
Kg=Vectorize(k)

If we plot the empirical Kendall's function (obtained from the sample), with different theoretical ones, derived from Clayton copulas (on the left, in blue) or Gumbel copula (on the right, in purple), we have the following,

Note that the different curves were obtained when Clayton copula has Kendall's tau equal to 0, .1, .2, .3, ..., .9, 1, and similarly for Gumbel copula (so that Figures can be compared). The following table gives a correspondence, from Kendall's tau to the underlying parameter of a copula (for different families)

as well as Spearman's rho,

To conclude, observe that there are two important particular cases that can be identified here: the case of perfect dependent, on the first diagonal when , and the case of independence, the upper green curve, . It should also be mentioned that it is also common to plot not function , but function , defined as ,

## Association and concordance measures

Following the course, in order to define assocation measures (from Kruskal (1958)) or concordance measures (from Scarsini (1984)), define a concordance function as follows: let be a random pair with copula , and with copula . Then define

the so-called concordance function. Thus

As proved last week in class,

Based on that function, several concordance measures can be derived. A popular measure is Kendall's tau, from Kendall (1938), defined as i.e.

which is simply . Here, computation can be tricky. Consider the following sample,

> set.seed(1)
> n=40
> library(mnormt)
> X=rmnorm(n,c(0,0),
+ matrix(c(1,.4,.4,1),2,2))
> U=cbind(rank(X[,1]),rank(X[,2]))/(n+1)

Then, using R function, we can obtain Kendall's tau easily,

> cor(X,method="kendall")[1,2]
[1] 0.3794872

To get our own code (and to understand a bit more how to get that coefficient), we can use

> i=rep(1:(n-1),(n-1):1)
> j=2:n
> for(k in 3:n){j=c(j,k:n)}
> M=cbind(X[i,],X[j,])
> concordant=sum((M[,1]-M[,3])*(M[,2]-M[,4])>0)
> discordant=sum((M[,1]-M[,3])*(M[,2]-M[,4])<0)
> total=n*(n-1)/2
> (K=(concordant-discordant)/total)
[1] 0.3794872

or the following (we'll use random variable quite frequently),

> i=rep(1:n,each=n)
> j=rep(1:n,n)
> Z=((X[i,1]>X[j,1])&(X[i,2]>X[j,2]))
> (K=4*mean(Z)*n/(n-1)-1)
[1] 0.3794872

Another measure is Spearman's rank correlation, from Spearman (1904),

where has distribution .

Here, which leads to the following expressions

Numerically, we have the following

> cor(X,method="spearman")[1,2]
[1] 0.5388368
> cor(rank(X[,1]),rank(X[,2]))
[1] 0.5388368

Note that it is also possible to write

Another measure is the cograduation index, from Gini (1914), obtained by sybstituting an L1 norm instead of a L2 one in the previous expression,

Note that this index can also be defined as . Here,

> Rx=rank(X[,1]);Ry=rank(X[,2]);
> (G=2/(n^2) *(sum(abs(Rx+Ry-n-1))-
+ sum(abs(Rx-Ry))))
[1] 0.41

Finally, another measure is the one from Blomqvist (1950). Let denote the median of , i.e.

Then define

or equivalently

> Mx=median(X[,1]);My=median(X[,2])
> (B=4*sum((X[,1]<=Mx)*((X[,2]<=My)))/n-1)
[1] 0.4

Wednesday, February 29 2012

## Further reading on extreme values

In the syllabus of the course MAT8886 I mentioned three references,

• Resnick (1987) Extreme Values, Regular Variation, and Point Processes. Springer Verlag
• Embrechts, Klueppelberg and Mikosch (1997) Modelling Extremal Events for Insurance and Finance. Springer Verlag(yes, I do work with the old version)
• Beirlant, Goegebeur, Segers and Teugels (2006) Statistics of Extremes. Wiley

I can perhaps add three more (that I did not really used to prepare the course, but that might also be interesting on specific topics)

• Coles (2001) An introduction to statistical modeling of extreme values. Springer Verlag
• Reiss and Thomas (2001) Statistical analysis of extreme values: with applications to insurance, finance, hydrology and other fields . Birkhaüser
• de Haan and Ferreira (2006) Extreme value theory: an introduction. Springer Verlag

As requested, I might be a bit more specific about references and the course,

• max-domain of attraction, Fisher-Tippett theorem, regular variation and von Mises conditions [Resnick (chap. 0 and 1), Embrechts et al. (chap. 3.1-3.3), Beirlant et al. (chap. 2)]
• point processes, exceedences, Pickands-Balkema-de Haan theorem [Embrechts et al. (chap. 3.4, and 5.1-5.3), Coles (chap. 4), Reiss et al. (chap. II.5)]
• statistical inference, Hill's estimator, mean excess function, maximum likelihood techniques [Embrechts et al. (chap. 6), Beirlant et al. (chap. 4 and 5)]
• applications, return period, quantile estimation and related risk measures [Coles (chap. 3 and 4)]

Friday, February 24 2012

## MAT8886 copula dataset

To illustrate inference issues on the copula section, we will use the so-called loss-alae dataset. This dataset contains

• LOSS : the loss amount, up to the limit.
• ALAE : the associated expense,
• LIMIT : the limit (-99 means no limit), and
• CENSOR (1=censored/limit reached, 0=otherwise).

The dataset is the following

> lossalae=read.table(
+ "http://freakonometrics.free.fr/lossalae.txt",
> X=lossalae$LOSS > Y=lossalae$ALAE
> plot(X,Y)

It is possible to visualize that dataset, on a log-scale

> xhist <- hist(log(X), plot=FALSE)
> yhist <- hist(log(Y), plot=FALSE)
> top <- max(c(xhist$counts, yhist$counts))
> nf <- layout(matrix(c(2,0,1,3),2,2,byrow=TRUE),
+ c(3,1), c(1,3), TRUE)
> par(mar=c(3,3,1,1))
> plot(X,Y, xlab="", ylab="",log="xy",col="red")
> par(mar=c(0,3,1,1))
> barplot(xhist$counts, axes=FALSE, ylim=c(0, top), + space=0,col="light green") > par(mar=c(3,0,1,1)) > barplot(yhist$counts, axes=FALSE, xlim=c(0, top),
+ space=0, horiz=TRUE,col="light blue")

or the copula-type scatterplot (i.e. ranks of the observations),

> n=length(X)+1
> U=rank(X)/n
> V=rank(Y)/n
> xhist <- hist((U), plot=FALSE)
> yhist <- hist((V), plot=FALSE)
> top <- max(c(xhist$counts, yhist$counts))
> nf <- layout(matrix(c(2,0,1,3),2,2,byrow=TRUE),
+ c(3,1), c(1,3), TRUE)
> par(mar=c(3,3,1,1))
> plot(U,V, xlab="", ylab="",col="red")
> par(mar=c(0,3,1,1))
> barplot(xhist$counts, axes=FALSE, ylim=c(0, top), + space=0,col="light green") > par(mar=c(3,0,1,1)) > barplot(yhist$counts, axes=FALSE, xlim=c(0, top),
+ space=0, horiz=TRUE,col="light blue")

Wednesday, February 22 2012

## multivariate cumulative distribution functions

Last week, we introduced Fréchet class. Consider d univariate cumulative distribution functions (e.g., as a start ). From Fréchet (1951) or Fréchet (1960) it is possible to consider the set of distribution functions with given margins,  .

The property of increasing function we had in the univariate case comes from the fact that the probability measure should be positive on any segment . In higher dimension, the following concept is necessary. A function is 2-increasing if its -volume is positive on any rectangle , where is defined as

In dimension 3, let , and consider the -volume of , defined as

Again, is 3-increasing if its -volume is positive on any rectangle .  And more generally (see e.g. Sklar (1996))

Tuesday, February 21 2012

## Tests on tail index for extremes

Since several students got the intuition that natural catastrophes might be non-insurable (underlying distributions with infinite mean), I will post some comments on testing procedures for extreme value models.

A natural idea is to use a likelihood ratio test (for composite hypotheses). Let denote the parameter (of our parametric model, e.g. the tail index), and we would like to know whether is smaller or larger than (where in the context of finite versus infinite mean ). I.e. either belongs to the set or to its complementary . Consider the maximum likelihood estimator , i.e.

Let and denote the constrained maximum likelihood estimators on and respectively,

Either and (on the left), or and (on the right)

So likelihood ratios

are either equal to

or

If we use the code mentioned in the post on profile likelihood, it is easy to derive that ratio. The following graph is the evolution of that ratio, based on a GPD assumption, for different thresholds,
> base1=read.table(
+ "http://freakonometrics.free.fr/danish-univariate.txt",
> library(evir)
> X=base1$Loss.in.DKM > U=seq(2,10,by=.2) > LR=P=ES=SES=rep(NA,length(U)) > for(j in 1:length(U)){ + u=U[j] + Y=X[X>u]-u + loglikelihood=function(xi,beta){ + sum(log(dgpd(Y,xi,mu=0,beta))) } + XIV=(1:300)/100;L=rep(NA,300) + for(i in 1:300){ + XI=XIV[i] + profilelikelihood=function(beta){ + -loglikelihood(XI,beta) } + L[i]=-optim(par=1,fn=profilelikelihood)$value }
+ plot(XIV,L,type="l")
+ PL=function(XI){
+ profilelikelihood=function(beta){
+ -loglikelihood(XI,beta) }
+ return(optim(par=1,fn=profilelikelihood)$value)} + (L0=(OPT=optimize(f=PL,interval=c(0,10)))$objective)
+ profilelikelihood=function(beta){
+ -loglikelihood(1,beta) }
+ (L1=optim(par=1,fn=profilelikelihood)$value) + LR[j]=L1-L0 + P[j]=1-pchisq(L1-L0,df=1) + G=gpd(X,u) + ES[j]=G$par.ests[1]
+ SES[j]=G\$par.ses[1]
+ }
>
> plot(U,LR,type="b",ylim=range(c(0,LR)))
> abline(h=qchisq(.95,1),lty=2)

with on top the values of the ratio (the dotted line is the quantile of a chi-square distribution with one degree of freedom) and below the associated p-value
> plot(U,P,type="b",ylim=range(c(0,P)))
> abline(h=.05,lty=2)

In order to compare, it is also possible to look at confidence interval for the tail index of the GPD fit,
> plot(U,ES,type="b",ylim=c(0,1))
> lines(U,ES+1.96*SES,type="h",col="red")
> abline(h=1,lty=2)

To go further, see Falk (1995), Dietrich, de Haan & Hüsler (2002), Hüsler & Li (2006) with the following table, or Neves & Fraga Alves (2008). See also here or there (for the latex based version) for an old paper I wrote on that topic.

- page 1 of 3