Comments on probabilities
By arthur charpentier on Tuesday, November 2 2010, 15:10 - Statistics - Permalink
The only thing I remember from courses I had in probability a
few years ago is that we also have to clearly defined the
event we want to calculate the probability. On the Freakonomics blog,
last week, the Israeli lottery was mentioned (here, see also there
where I mentioned that, and odds facts from the French lottery),

Since 6 numbers are drawn out of a pool of numbers from 1 to 37, the total number of combination at each lottery is

[1] 2324784
Over 8 lotteries (since there are two draws per week, we can assume there 8 draws per month), the probability of no identical draws is

> prod(n-0:7)/n^8
[1] 0.999988
Each month, the probability of "coincidence" (I define "coincidence" the event "over 8 draws, at least two times, we obtained the same 6-uplet" or more precisely (as mentioned here) "over one calendar month, at least two times, we obtained the same 6-uplet") is p=1.204407e-05.
> (p=1-(prod(n-0:7)/n^8))
[1] 1.204407e-05
The occurrence of a coincidence each month as a Geometric distribution, with probability p. And it is classical, following Gumbel's definition (here), to consider 1/p, called the "return period", i.e. the number of months we have to wait until we observe a coincidence (i.e. a repetition in the same month), since for a geometric distribution

[1] 6919.034
Here, the (expected) return period is 6919 years.
From my point of view, this is “the incident of six numbers repeating themselves within a calendar month”, and this is an event of once in 6919.034 years. On the other hand the median of a geometric distribution is

[1] 4795.88
which means that we have 50% chance to get such a coincidence over 4796 years.
Of course, if instead of looking at a longer period, say 100 draws, i.e. one year (here I define "coincidence" the event "over 100 draws, at least two times, we obtained the same 6-uplet"), we have in red the expected return period, and in blue the median of the geometric distribution,

> for(i in 2:100){
+ p=1-exp((sum(log(n-0:(i-1)))-i*log(n)))
+ E[i]=1/p/(100/i)
+ M[i]=-log(2)/log(1-p)/(100/i)
+ }
> plot(1:100,E,ylim=c(0,10000),type="l",col="red",lwd=2)
> lines(1:100,M,col="blue",lwd=2)
> abline(v=8,lty=2)
> points(8,E[8],pch=19,col="red")
> points(8,M[8],pch=19,col="blue")
or below of a log-scaled version


But here, we can have one 6-uplet in Israel, and the other one in Egypt, say... If we want to get the same 6-uplet in the same country, the graph is now
i.e. each month there is a chance over one thousand...> i=8
> p=1-exp((sum(log(n-0:(i-1)))-i*log(n)))
> 1-(1-p)^100
[1] 0.001203689
Note: actually, Xi'an mentioned that the probability that this coincidence [of two identical draws over 188 draws] occurred in at least one out of 100 lotteries (there are hundreds of similar lotteries across the World) is 53%! And I got the same,
> 1-(1-P[188])^100
[1] 0.5305219







Comments
The geometric model is a simplification of the actual process in that we are considering repeated draws within a month time interval rather than within a calendar month. This means excluding (n-7) values from each draw after the first month draws, which increases the final probability...
ANSWER : you're right, I forgot to mention that a "month" here was a "calendar month"... so if one 6-uplet shows up end of October and again beginning of November, it does not count.
sur la tricherie roumaine, j'avais écrit ceci :
http://www.vicnent.info/blog/index....