A short post today on a rather surprising result. I can't figure out how to prove it (I don not know if the result is valid, or known, so I claim it is an open question). Consider an i.i.d. sequence $\{X_1,\cdots,X_n\}$ with common survival function $\overline{F}$. Assume further that $\overline{F}$ is regularly varying at $\infty$, written $\overline{F}\in RV_{\alpha}$ , i.e.

$\lim_{t\rightarrow\infty}\frac{\overline{F}(tx)}{\overline{F}(t)}=x^{\alpha}$

with $\alpha\in(-\infty,0)$. Then the tail of $X$ is Pareto type, and we can use Hill's estimator to estimate $\alpha$. An interesting result is that if $\overline{F}\in RV_{\alpha}$ and $\overline{G}\in RV_{\beta}$, then $\overline{F}\star\overline{G}$ is also regularly varying, where $\star$ denote the convolution operator. More precisely, when $\alpha=\beta$, then (see Feller (), section VIII.8), then $\overline{F}\star\overline{G}$ is regularly varying with the same index.

This can be visualized below, on simulation of Pareto variables, where the tail index is estimated using Hill's estimator. First, we start we one sample, either of size 20 (on the left) or 100 (on the right),

> library(evir)
> n=20
> set.seed(1)
> alpha=1.5
> X=runif(n)^(-1/alpha)
> hill(X)
> abline(h=1.5,col="blue")

If we generate two (independent) samples, and then look at the sum, Hill's estimator does not perform very well (the sum of two independent Pareto variates is no longer Pareto, but only Pareto type) we have

> set.seed(1)
> alpha=1.5
> X=runif(n)^(-1/alpha)
> Y=runif(n)^(-1/alpha)
> hill(X+Y)
> abline(h=1.5,col="blue")

The idea is then to use a Jackknife strategy in order to (artificially) increase the size of our sample. Thus, consider sums on all pairs of all $X_i$'s, i.e. sample

$\{X_1+X_2,\cdots,X_1+X_{n},X_2+X_3,\cdots,X_{n-1}+X_n\}$

Let us use Hill's estimator on this (much larger) sample.

> XC=NA
> for(i in 1:(n-1)){
+ for(j in (i+1):n){
+ XC=c(XC,X[i]+X[j])
+ }}
> XC=XC[-1]
> hill(XC)
> abline(h=1.5,col="blue")
> abline(h=1.5*2,col="blue",lty=2)
> abline(h=1.5^2,col="blue",lty=3)

Here, with 20 observations from the initial sample, it looks like the tail index is $2\alpha$ (on the left). With 100 observations, it looks like it is $\alpha^2$ (on the right). On the graphs above, I have plotted those two horizontal lines. It's odd, isn't it ? Of course, I did not really expect $\alpha$ since we do not have an i.i.d. sample. Identically distributed yes, with a regularly varying survival function from what we've mentioned before. But clearly not independent. So Hill's estimator should not be a good estimator of $\alpha$, but it might be a good estimator of some function of $\alpha$

If we go one step further, an consider all triplets,

$\{X_1+X_2+X_3,\cdots,X_1+X_2+X_{n},X_2+X_3+X_4,\cdots,X_{n-2}+X_{n-1}+X_n\}$

(observe that now, the sample size is huge, even if we start with only 20 points). Then, it looks like the tail index should be $\alpha^3$ (at least, we can observe a step of Hill's plot around $\alpha^3$).

> XC=NA
> for(i in 1:(n-2)){
+ for(j in (i+1):(n-1)){
+ for(k in (j+1):n){
+ XC=c(XC,X[i]+X[j]+X[k])
+ }}}
> XC=XC[-1]
> hill(XC)
> abline(h=1.5,col="blue")
> abline(h=1.5*3,col="blue",lty=2)
> abline(h=1.5^3,col="blue",lty=3)

My open question is whether there is general result behind, or if it was just a coincidence that those values appear so clearly.