# Question:A remark about Statistics:-Sampling(..., method=[envelope...])

## Question:A remark about Statistics:-Sampling(..., method=[envelope...])

Maple

Hi,

Recently a few questions concerning the sampling of the Cauchy distribution and the sampling of a truncated Normal distribution have been posted (mainly by  @jalale).
This post is concerned by the sampling of a truncated (standard) Cauchy distribution.

In a first part the efficiency fo two methods is adressed in the case of a non-truncated Cauchy distribution:

• The "standard" Maple's command Statistics:-Sample(Cauchy(0, 1), N)
• And a general method a priori very efficient if one knows the ICDF (Inverse Cumulative Function Distribution). It happens that this ICDF is just cot(U*Pi)  where U is a Uniform RV over [0, 1].

The second part adresses the sampling of a truncatedCauchy distribution with two methods:

• The "standard" Maple's command Statistics:-Sample(Cauchy(0, 1), N, method=[envelope, range=...])
• The method based on the use of the ICDF

Results:

Test1 (non-truncated Cauchy distribution)

• "Standard" Maples sampling outperforms the ICDF based method in terms of :
• memory occupation: ICDF is twice more demanding
• cpu time: ICDF is ten times slower

Test2 (truncated Cauchy distribution)

• ICDF based method i outperforms "Standard" Maples sampling oin terms of :
• memory occupation: Maples "envelope sampling" method is twice more demanding
• cpu time: Maples "envelope sampling" method is two times slower

But, beyond these simple observations, a disturbing problem is: the "envelope sampling" method seems to not return the correct distribution (at least when used this waymethod=[envelope, range=a..b]  with a < b)
This is confirmed by the two last plot where histogram and PDF are uperimposed.

Do you think this problem can be avoided by another parameterization of the "envelope sampling" method or that it reveals some underlying problem with it?

PS: I did not investigate further for other distributions .

 >

Sampling the Cauchy distribution

Maple's default sampling method outperformes the adhoc method

 > restart
 > with(Statistics):
 > C := RandomVariable(Cauchy(0, 1))
 (1)
 > f := unapply(CDF(C, t), t);
 (2)
 > finv := unapply(-solve(f(t)=u, t), u)
 (3)
 > # "natural" way to proceed N  := 10^6: S1 := CodeTools:-Usage(Sample(C, N)):
 memory used=7.71MiB, alloc change=39.63MiB, cpu time=69.00ms, real time=69.00ms, gc time=8.72ms
 > # Let's try the sampling strategy based on the inverse of the CDF # Usually it starts from sampling a Uniform RV on [0, 1] and # next applies finv to the result. # # Smart but inefficient U  := RandomVariable(Uniform(0., 1)): S2 := CodeTools:-Usage(finv~(Sample(U, N))):
 memory used=145.02MiB, alloc change=7.63MiB, cpu time=4.94s, real time=3.04s, gc time=2.61s
 > # Much more efficient # # Given the special form of finv it's simpler to sample a Unirorm RV # on [0, Pi] and apply "cot" to the result pi := evalf(Pi): U  := RandomVariable(Uniform(0., pi)): S2 := CodeTools:-Usage(cot~(Sample(U, N))):
 memory used=15.28MiB, alloc change=0 bytes, cpu time=652.00ms, real time=237.00ms, gc time=577.78ms

Sampling a truncated Cauchy distribution

Example 1:
with(Statistics) + method=[envelope, range=-10..10]

The adhoc method outperforms Maple's default sampling method

 > S1 := CodeTools:-Usage(Sample(C, N, method=[envelope, range=-10..10])): Histogram(S1);
 memory used=8.88MiB, alloc change=-7.63MiB, cpu time=322.00ms, real time=260.00ms, gc time=93.77ms
 > p  := Probability(C < -10, numeric); q  := 1-Probability(C > +10, numeric); U  := RandomVariable(Uniform(p*pi, q*pi)): S2 := CodeTools:-Usage(cot~(Sample(U, N))): Histogram(S2);
 memory used=15.28MiB, alloc change=7.63MiB, cpu time=170.00ms, real time=113.00ms, gc time=92.11ms

Sampling a truncated Cauchy distribution

Example 2:
with(Statistics) + method=[envelope, range=-1..1]

The adhoc method outperformes Maple's default sampling method

 > S1 := CodeTools:-Usage(Sample(C, N, method=[envelope, range=-1..1])): scaling := Probability(C < +1, numeric) - Probability(C < -1, numeric); plots:-display( Histogram(S1), plot(PDF(C, t)/scaling, t=-1..1, thickness=3, color=red) );
 memory used=8.08MiB, alloc change=0 bytes, cpu time=246.00ms, real time=215.00ms, gc time=46.62ms
 > p  := Probability(C < -1, numeric); q  := 1-Probability(C > +1, numeric); U  := RandomVariable(Uniform(p*pi, q*pi)): S2 := CodeTools:-Usage(cot~(Sample(U, N))): plots:-display( Histogram(S2), plot(PDF(C, t)/scaling, t=-1..1, thickness=3, color=red) );
 memory used=15.28MiB, alloc change=7.63MiB, cpu time=163.00ms, real time=106.00ms, gc time=85.93ms
 >