A short post today on a rather surprising result. I can't figure out how to prove it (I don not know if the result is valid, or known, so I claim it is an open question). Consider an i.i.d. sequence http://latex.codecogs.com/gif.latex?\{X_1,\cdots,X_n\} with common survival function http://latex.codecogs.com/gif.latex?\overline{F}. Assume further that http://latex.codecogs.com/gif.latex?\overline{F} is regularly varying at http://latex.codecogs.com/gif.latex?\infty, written http://latex.codecogs.com/gif.latex?\overline{F}\in%20RV_{\alpha} , i.e.

http://latex.codecogs.com/gif.latex?\lim_{t\rightarrow\infty}\frac{\overline{F}(tx)}{\overline{F}(t)}=x^{\alpha}

with http://latex.codecogs.com/gif.latex?\alpha\in(-\infty,0). Then the tail of http://latex.codecogs.com/gif.latex?X is Pareto type, and we can use Hill's estimator to estimate http://latex.codecogs.com/gif.latex?\alpha. An interesting result is that if http://latex.codecogs.com/gif.latex?\overline{F}\in%20RV_{\alpha} and http://latex.codecogs.com/gif.latex?\overline{G}\in%20RV_{\beta}, then http://latex.codecogs.com/gif.latex?\overline{F}\star\overline{G} is also regularly varying, where http://latex.codecogs.com/gif.latex?\star denote the convolution operator. More precisely, when http://latex.codecogs.com/gif.latex?\alpha=\beta, then (see Feller (), section VIII.8), then http://latex.codecogs.com/gif.latex?\overline{F}\star\overline{G} is regularly varying with the same index.

This can be visualized below, on simulation of Pareto variables, where the tail index is estimated using Hill's estimator. First, we start we one sample, either of size 20 (on the left) or 100 (on the right),

> library(evir)
> n=20
> set.seed(1)
> alpha=1.5
> X=runif(n)^(-1/alpha)
> hill(X)
> abline(h=1.5,col="blue")

If we generate two (independent) samples, and then look at the sum, Hill's estimator does not perform very well (the sum of two independent Pareto variates is no longer Pareto, but only Pareto type) we have

> set.seed(1)
> alpha=1.5
> X=runif(n)^(-1/alpha)
> Y=runif(n)^(-1/alpha)
> hill(X+Y)
> abline(h=1.5,col="blue")

The idea is then to use a Jackknife strategy in order to (artificially) increase the size of our sample. Thus, consider sums on all pairs of all http://latex.codecogs.com/gif.latex?X_i's, i.e. sample

http://latex.codecogs.com/gif.latex?\{X_1+X_2,\cdots,X_1+X_{n},X_2+X_3,\cdots,X_{n-1}+X_n\}

Let us use Hill's estimator on this (much larger) sample.

> XC=NA
> for(i in 1:(n-1)){
+ for(j in (i+1):n){
+ XC=c(XC,X[i]+X[j])
+ }}
> XC=XC[-1]
> hill(XC)
> abline(h=1.5,col="blue")
> abline(h=1.5*2,col="blue",lty=2)
> abline(h=1.5^2,col="blue",lty=3)

Here, with 20 observations from the initial sample, it looks like the tail index is http://latex.codecogs.com/gif.latex?2\alpha (on the left). With 100 observations, it looks like it is http://latex.codecogs.com/gif.latex?\alpha^2 (on the right). On the graphs above, I have plotted those two horizontal lines. It's odd, isn't it ? Of course, I did not really expect http://latex.codecogs.com/gif.latex?\alpha since we do not have an i.i.d. sample. Identically distributed yes, with a regularly varying survival function from what we've mentioned before. But clearly not independent. So Hill's estimator should not be a good estimator of http://latex.codecogs.com/gif.latex?\alpha, but it might be a good estimator of some function of http://latex.codecogs.com/gif.latex?\alpha

If we go one step further, an consider all triplets,

http://latex.codecogs.com/gif.latex?\{X_1+X_2+X_3,\cdots,X_1+X_2+X_{n},X_2+X_3+X_4,\cdots,X_{n-2}+X_{n-1}+X_n\}

(observe that now, the sample size is huge, even if we start with only 20 points). Then, it looks like the tail index should be http://latex.codecogs.com/gif.latex?\alpha^3 (at least, we can observe a step of Hill's plot around http://latex.codecogs.com/gif.latex?\alpha^3). 

> XC=NA
> for(i in 1:(n-2)){
+ for(j in (i+1):(n-1)){
+ for(k in (j+1):n){
+ XC=c(XC,X[i]+X[j]+X[k])
+ }}}
> XC=XC[-1]
> hill(XC)
> abline(h=1.5,col="blue")
> abline(h=1.5*3,col="blue",lty=2)
> abline(h=1.5^3,col="blue",lty=3)

My open question is whether there is general result behind, or if it was just a coincidence that those values appear so clearly.