## 847 Reputation

3 years, 183 days

## How to make this procedure more efficien...

Maple

Hi,

This thread is more or less related to a previous one about the Statistics:-Sample procedure.
(see https://www.mapleprimes.com/questions/228421-A-Serious-Problem-With-StatisticsSample )
I've just implemented two variant of the Box-Muller procedure to sample normal rvs.
The source is "The art of computer programming", Donald E. Knuth, 2nd edition, p117 (aka "algorithm P").

The first implementation (BoxMuller_1) is basically what D.E. Knuth writes, except that I "vectorize" some operations in order to avoid using an if.. then..else structure (as a minor consequence I generate a little bit more numbers than required).
This procedure uses the build-in Maple's procedure select (please see the link above and acer's and carl love's replies).
It appears to be relatively slow.
More of this CodeTools:-Usage(BoxMuller_1(10^6)) generates a "conenction to kernel lost" for some unknown reason.

The variant named BoxMuller_2 uses sort and ListTools:-BinaryPlace instead of select.
Ir appears to be around 3 times faster than BoxMuller_1, but remains 10 time slower than Statistics:-Sample(..., method=envelope) (here again, see the link above to understand why method=envelope is needed).

I wonder how it could be possible to speed up this procedure. In particular acer showed in one of his reply (again see the link above) how using hfloats can improve the efficiency of a procedure, but I'm very incompetent on this point.
Does anyone have any idea?

PS: this file has been written with Maple 2015.2

 > restart:
 > BoxMuller_1 := proc(N)   local V, S, T, L, X1, X2:   V  := Statistics:-Sample(Uniform(-1, 1), [ceil(2*N/3), 2]):   S  := V[..,1]^~2 +~ V[..,2]^~2;   T  := < subs(NULL=infinity, select(`<`, S, 1)) | V >;   T[.., 1]  := (-2 *~ log~(T[.., 1]) /~ T[.., 1])^~(0.5);   X1 := select[flatten](type, T[.., 1] *~  T[..,2], 'float');   X2 := select[flatten](type, T[.., 1] *~  T[..,3], 'float');   return : end proc: BoxMuller_2 := proc(N)   local V1, V2, S, W, u, r, T, X1, X2:   V1 := Statistics:-Sample(Uniform(-1, 1), ceil(2*N/3)):   V2 := Statistics:-Sample(Uniform(-1, 1), ceil(2*N/3)):   S  := V1^~2 +~ V2^~2;   W  := sort(S, output = [sorted, permutation]):   u  := ListTools:-BinaryPlace(W[1], 1);   r  := [\$1..u]:   T  := (-2 *~ log~(W[1][r]) /~ W[1][r])^~(0.5);   X1 := T *~ V1[W[2][r]];   X2 := T *~ V2[W[2][r]];   return ^+: end proc:
 > CodeTools:-Usage(Statistics:-Sample(Uniform(-1, 1), 10^5)): CodeTools:-Usage(Statistics:-Sample(Uniform(-1, 1), 10^5, method=envelope)): # CodeTools:-Usage(BoxMuller_1(10^6)): # generates a "connection to kernel lost" error msg CodeTools:-Usage(BoxMuller_2(10^6)): print(): CodeTools:-Usage(BoxMuller_1(10^5)): CodeTools:-Usage(BoxMuller_2(10^5)): print(): S1 := BoxMuller_1(10^5): S2 := BoxMuller_2(10^5):
 memory used=1.17MiB, alloc change=0 bytes, cpu time=12.00ms, real time=12.00ms, gc time=0ns memory used=3.32MiB, alloc change=32.00MiB, cpu time=64.00ms, real time=64.00ms, gc time=0ns memory used=148.17MiB, alloc change=119.93MiB, cpu time=706.00ms, real time=637.00ms, gc time=14.63ms
 memory used=67.43MiB, alloc change=186.80MiB, cpu time=946.00ms, real time=762.00ms, gc time=285.14ms memory used=14.81MiB, alloc change=0 bytes, cpu time=67.00ms, real time=62.00ms, gc time=0ns
 (1)
 > if false then DocumentTools:-Tabulate(   [     plots:-display(       Statistics:-Histogram(S1),       plot(Statistics:-PDF(Normal(0, 1), x), x=-4..4, color=red,thickness=3)     ),     plots:-display(       Statistics:-Histogram(S2),       plot(Statistics:-PDF(Normal(0, 1), x), x=-4..4, color=red,thickness=3)     )   ], width=60 ) end if:
 > # carl love's procedure with slight modifications t SampleCheck := proc(X, f, N::posint) uses St= Statistics;   local S, M, O, E:   S:= f(N):   M:= numelems(S):   O:= )>:   E:= St:-Probability~(X <~ (rhs@lhs)~(O), 'numeric') * M:    end proc:
 > interface(rtablesize=20): SampleCheck(Statistics:-RandomVariable(Normal(0,1)), 'BoxMuller_1', 10^5), SampleCheck(Statistics:-RandomVariable(Normal(0,1)), 'BoxMuller_2', 10^5);
 (2)
 >

## A serious problem with Statistics:-Sampl...

Maple

Hi,
The choice method=default (the implicit choice) gives dramatically false results as demonstrated here for the sampling of a normal r.v.
It's likely to be the case for any continuous distribution.
In a few words the tails of the distribution are not correctly sampled (far from the mean by 4 standard deviations for a normal r.v., not an extremely rare event in practical applications).

I think the attached file is sufficiently documented for you to understand the problem.

 > restart:

The problem presented here also exists with Maple 2018 and Maple 2019

 > interface(version)
 (1)
 > with(Statistics):
 > X := RandomVariable(Normal(0, 1)):

Generate a sample of X of size 10^6

Here is the R code for those who would like to verify the results and the performances

N <- 10^6

R <- 20

Q <- c(0:5)

M <- matrix(nrow=R, ncol=length(Q))

for (r in c(1:20)) {

S <- rnorm(N, mean=0, sd=1)

for (q in Q) {

M[r, q+1] <-length(S[S>q])

}

}

Expected.Number.of.Outliers <- floor(N - pnorm(Q, mean=0, sd=1) * N)

Observed.Number.of.Outliers <-  round(colMeans(M))

Expected.Number.of.Outliers

Observed.Number.of.Outliers

Absolute.Differences <- M - kronecker(t(Expected.Number.of.Outliers), rep(1, R))

boxplot(differences)

Remark the loop below takes about 30 seconds to run (to be compared to the 20 minutes
it would take am I used the build-in procedure Select, and to  the 3 seconds R demands).

 > N := 10^6: R := 20: Q  := [\$0..5]: nQ := numelems(Q): M  := Matrix(R, nQ, 0): for r from 1 to R do   S := Sample(X, N):   for q in Q do     M[r, q+1] := add(Heaviside~(S -~ q)):   end do: end do:
 > Expected_Number_Of_Outliers := Vector[row](nQ, i -> Probability(X > Q[i], numeric)*N);
 (2)
 > Observed_Number_Of_Outliers := Mean(M);
 (3)

While the values of  the Observed_Number_of_Outliers seem to be reasonably correct for q =0, 1, 2 and 3
there are highly suspicious for q = 4 and likely for q = 5 (but N is too small to be categorical)

So let's examine more closely rhe results for q = 4

 > `q=4`:= convert(M[..,5], list);  min(`q=4`)
 (4)

This means we obtained 20 estimations out of 20 of the probability that X exceeds q=4.
There are only about on chance out of 1 million to get such a result (roughly (1/2)^20 ).

Here is the result R returned for the same quantile q=4

M[,5]

[1] 24 37 23 38 30 43 23 30 25 39 25 29 29 36 39 41 35 37 36 34

(9 values out of 20 are less than the expected value of 31.67)

A very simple test based on the Binomial distribution will prove, without any doubt, that the
distribution of the 20 numbers of outliers for q = 4 implies the sampling algorithm is of
is of very poor quality.

Is it possible to obtain correct results with Maple ?

(simulations done below for the case q=4 only)

 > N := 10^6: R := 20: Menv := Vector[column](R, 0): for r from 1 to R do   S := Sample(X, N, method=envelope):   Menv[r] := add(Heaviside~(S -~ 4)): end do:
 > `q=4`:= convert(Menv, list); Mean(Menv); Variance(Menv);
 (5)

Conclusion, the "default" sampling strategy suffers strong flaw.

Remark: it's well known that one of the best method to sample normal random variables
is Box-Mueller's: why it has not been programmed by default ?

 >

## i can't find our where my error is......

Maple 2015

Hi,

Sorry to ask such a stupid question but I can't find out where my error is. Probably it's so huge it blinds me!

The double loop and the matrix product F^+ . F should give the same result, no? (it seems that F^+ . F has its rows reordered ?)

 > restart:
 > N   := 3: P   := 2: niv := [seq(Z[i], i=1..N)]; f   := Matrix(N^P, P, (i,j) -> `if`(j=P, niv[(i mod 3)+1], niv[iquo(i-1,3)+1]));
 (1)
 > ds := subs(niv =~ [\$0..N-1], f);
 (2)
 > vs := [ seq(V__||i, i=1..P)]: es := unapply( sort( [ seq( mul(vs ^~ [entries(ds[i,..], nolist)]), i=1..N^P) ] ), vs);
 (3)
 > ff := convert([ seq(es(entries(ffd[i,..], nolist)), i=1..N^P) ], Matrix); UnityRoots := [solve(z^3=1, z)]: F := simplify(subs(niv =~ UnityRoots, ff)) /~ sqrt(N^P):
 (4)

Scalar products of pairs of comumn vectors

F must be an orthogonal array

 > for i1 from 1 to N^P do   for i2 from 1 to N^P do     printf("%a ", simplify(add(F[..,i1] . F[.., i2])))   end do:   printf("\n"): end do: printf("\n");
 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1

or more simply:

 > simplify(F^+ . F)
 (5)
 >

## How to define an absolute color scale in...

Maple 2015

Hi,

I need to plot some correlation matrices C1, C2, ... and I use matrixplot for this.
I would like to use the same absolute scale (-1..+1) for all of them.
For instance is I decide to uses colorscheme=["blue", "white", "red"] I would like blue to correspond to value -1, white to value 0 and red to value 1.
Unfortunately colorscheme set to blue the cell with the mininum value (not necessarily -1) and to red the maximum one (not necessarily +1).
Here is an example

restart:
with(plots):
with(Statistics):
randomize():
N := 10:
P := 3:
A := Sample(Uniform(0, 1), [N, P]):
C := CorrelationMatrix(A):
matrixplot(
C,

heights=histogram,
axes=frame,
​​​​​​​  gap=0.25,
​​​​​​​  color=((x,y)->(C[x,y]+1)/2),
​​​​​​​  orientation=[0, 0, 0],
​​​​​​​  lightmodel=none,
​​​​​​​  tickmarks=[[seq(i+1/2=A||i, i=1..P)], [seq(i+1/2=A||i, i=1..P)], default],
​​​​​​​  labels=[("")\$3]​​​​​​​
​​​​​​​  );

​​​​​​​I also tried to use color=((x,y) -> (C[x, y]+1)/2) instead of colorscheme but here again matrixplot uses a local scale defined by the reange of the correlation matrix to plot.

I fixed this by using something like seq(seq(PLOT(POLYGONS(...), i=1..P), j=1..P) instead of matrixplot, but I think it is a shame to do so.

So my question: is it possible to force matrixplot not to use a scale defined by the matrix to plot, but a "user" scale?

PS: I'm using Maple 2015

## How to "stack" the procedures generated ...

Maple 2015

Hi,

I solve numerically an ode for different values of its parameters ( dsolve(..., numeric, parameters=[...] ) and I would like to "stack" the different solutions in a container (the container (list, vector, table) is of no matter).

Here is a notional example where I try to construct a sequence where the first element should be the solution when the parameter is equal to 1 and second one when the parameter is equal to 2.
It happens that some "premature evaluation" seems to occur which makes the two elements identical.

Please do not pay attention to the obvious simplicity of the problem: the true one is more complicated but can be illustrated by the on below.

 > restart:
 > f := dsolve({diff(x(t),t)=A*t, x(0)=0}, numeric, parameters=[A]);
 (1)
 > k := 1: for a in [1, 2] do   f(parameters=[a]):   printf(" f(1) = %a\n", f(1)):   g||k := unapply(f(t), t):   for kk from 1 to k do     printf("g%d(1) = %a\n", kk, g||kk(1)):   end do:   k := k+1:   print(): end do:
 f(1) = [t = 1., x(t) = .500000000000001] g1(1) = [t = 1., x(t) = .500000000000001]
 f(1) = [t = 1., x(t) = .999999999999999] g1(1) = [t = 1., x(t) = .999999999999999] g2(1) = [t = 1., x(t) = .999999999999999]
 (2)
 > # how must I correct this in order to prevent the # "over writting" of g1 when g2 is instanciated # and get # #  f(1) = [t = 1., x(t) = .999999999999999] # g1(1) = [t = 1., x(t) = 0.5] # g1(2) = [t = 1., x(t) = .999999999999999] #
 >