Let X a discrete uniform random variable over [1, N]
A bayesian approach :
We define the random variable Y that way :
for any y > 0 : Prob(Y=y|X=x) = 1/(x-1) if x > 1 and 0 otherwise
For the computations below it is more convenient to modify this
definition by discarding the particular case x=1.
To do this, consider now X a discrete uniform random variable over [2, N]
What is the probability P(Y=y) that Y takes the value y ?
Bayes' law says :
Prob(Y=y) = Sum(P(Y=y|X=x)*P(x), x=2..N)
where Prob(X=x) = 1/(N-1) for x=2,..,N
Let N = 7 ; what is the value of P(Y=1) ?
One finds :
P(Y=1) = P(Y=1|X=2)P(2) + ... + P(Y=1|X=7)P(7)
P(Y=1) = 1*(1/6) + ... + (1/6)*(1/2)
P(y=1) = 49/120 (about 0.408)
Doing the same for y=2, ..., 6 gives
P(Y=2) = 29/120
P(Y=3) = 19/120
P(Y=4) = 37/360
P(Y=5) = 11/180
P(Y=6) = 1/36
One easily verifies that P(Y=1) + ... + P(Y=6) = 1
What does this mean ?
Suppose you want do draw the mass function (the discrete analogous of
the ptobability density function for continuous random variables) with
"buildings" oh height H(x,y) : What should it look like?
First thing : the sum of all the heights H(x,y) must be 1.
Second thing : obviously H(x,y) = 0 for each y greater or equal to x
Consider now the couple (X=2, Y=1) :
H(x,y) must represent the probability P(Y=1|X=2)... which is 1*(1/6) !
Consider now the couple (X=3, Y=1) :
H(x,y) must represent the probability P(Y=1|X=3)... which is 1/2*(1/6),
because the two events P(Y=1|X=3) and P(Y=2|X=3) have exactly the same
probability.
And so on.
Let us count the total number C of configurations.
We have C = sum((n-1), n=2..N) which, for N=7 gives C=21
It is clear that these 21 buildings do not have the same heights.
The highest has value 1/6, the smallest 1/216.
Keep this value of 1/6 in mind, I will go back on it later.
More precisely, all the buildings H(x,y[j]) for j from 1 to x-1
have the same height, equal to 1/(x-1).
This has an important implication :
sum(H(x,y[j]), j=1..x-1) = 1/6 for each x from 2 to 7
whch is meant by saying that "the marginal distribution of Y" is
discrete uniform.
Now, what about this value 1/6 I talked before ?
Suppose you realise a simulation to build the mass function of
the couple (X, Y) and you use 2^17 samples.
The the height of the highest (H(2,1) building is 2^17/6, a value
close to 21845.
This is practically the value the second histogram (generated
by GenXY2) gives.
In fact, besides any intuition, the second histogram is the
good one, and the first is wrong.
Maybe you wonder where is the error in GenXY ?
It comes from the instruction `if`(x<y, [x,y], [y,x])[]
let me introduce a new random variable denoted U and defined by
for any u > 0 : Prob(U=u|X=x) = 1/x if x > 0
To picture that, remember that (X, Y) lies in the "triangle"
strictly below the first diagonal : U is just the complement
to Y and thus (X, U) lives in the triangle above, diagonal
included.
If one consider that the 'else' clause of the previous test returns
the a sample of (X, Y), than the 'then' clause returns a sample of
(X,U).
But, because Y is in [1, x-1] and U in [x, N], the test just returns
a sample of a couple of iid random variables of discrete uniform
distribution over the {1, .., N}x{1, .., N}.
So, all tge "buildings" must have here the same heights.
Carl's idea to return [x,y] or [y,x] to avoid discarding
"in mean" 1 our of 2 trials was a priori a good trick ...
But it returns a wrong result.
The good test is : if y<x then [x, y] end if
Which is exactly what John did.
So, John, do not worry, you are right