I'm trying to understand how far (or close) to these ideas the following distinction is:
X := Statistics:-RandomVariable(Normal(0,1));
# numeric vector return, with overhead
V := Statistics:-Sample(X,5);
# or...
G := Statistics:-Sample(X);
# numeric vector return, with less overhead
V := G(5);
There is still some overhead in the returned procedure G. Consider this variant,
kernelopts(opaquemodules=false):
Statistics:-ExternalSupport:-Initialize();
p := Statistics:-ExternalSupport:-DefineExternal("MapleNormalRandomSample");
p(5,0,1);
But it seems that Statistics:-ExternalSupport:-DefineExternal will behave according to UseHardwareFloats and Digits. So an even leaner "trampoline" sort of reassignment might be tricky.
Dave Linder
Mathematical Software, Maplesoft
Often the replies of Doug and Georgios show that they have made extra effort, both to understand what the original submitters were after as well as to illustrate their solutions. Good choices!
Dave L
ps. I have no part in the mentor awards selection process.
Hi Jacques,
Along with this other
reply to an earlier post, you seem to be hinting that we could do much better with the dispatching in the Statistics package.
We know that we should get rid of as much use of the Library level ProcessOptions package as possible, going instead with the newer, more efficient parameter-processing now in the kernel. (see ?paramprocessing) It gets a little complicated because, for example, routines like Sample can produce numeric Vectors or procedures to generate the same. Also, Statistics is complicated, as a package. Even amending Sample so that it can admit a Vector argument, for in-place re-use, is a bit involved.
I didn't see an example of statistical computations in your preprint that you cited here. Does your MapleMIX system already have examples related to what Maple's Statistics does, can I ask?
best regards,
Dave Linder
Mathematical Software, Maplesoft
Hi Jacques,
Along with this other
reply to an earlier post, you seem to be hinting that we could do much better with the dispatching in the Statistics package.
We know that we should get rid of as much use of the Library level ProcessOptions package as possible, going instead with the newer, more efficient parameter-processing now in the kernel. (see ?paramprocessing) It gets a little complicated because, for example, routines like Sample can produce numeric Vectors or procedures to generate the same. Also, Statistics is complicated, as a package. Even amending Sample so that it can admit a Vector argument, for in-place re-use, is a bit involved.
I didn't see an example of statistical computations in your preprint that you cited here. Does your MapleMIX system already have examples related to what Maple's Statistics does, can I ask?
best regards,
Dave Linder
Mathematical Software, Maplesoft
I happened to already have a code fragment for this idea. The example below does 32x32 complex Matrix^%H-Matrix multiplication 10000 times.
This only works in Linux, but shows the idea right away of using zgemm (f06zaf) directly. The benefits are avoiding recreation of Matrix rho at each iteration, and avoiding actual transposition. The speedup for this example is about a factor of 2.
> # benchmark
> kernelopts(printbytes=false):
> with(LinearAlgebra):
>
> (st,ba,bu) := time(),kernelopts(bytesalloc),kernelopts(bytesused):
> (maxiter,N,d) := 10000,5,2:
> temp := RandomMatrix(d^N,d^N,outputoptions=[datatype=complex[8]]):
> for i from 1 to maxiter do
> rho := LinearAlgebra:-LA_Main:-MatrixMatrixMultiply(
> LinearAlgebra:-LA_Main:-HermitianTranspose(temp,inplace=false,
> outputoptions=[]),
> temp,inlace=false,outputoptions=[]);
> od:
> time()-st,kernelopts(bytesalloc)-ba,kernelopts(bytesused)-bu;
9.768, 4652204, 576723804
> # Using zgemm directly, in a brand new session
# zgemm does c <- alpha*a*b + beta*c (can transpose a or b)
# Some bug prevents datatype=complex[8] on ARRAY parameters,
# but this can be circumvented using ArrayTools:-ComplexAsFloat.
> zgemm := define_external( 'zgemm_',
> TRANSA::string[1],
> TRANSB::string[1],
> M::REF(integer[4]),
> N::REF(integer[4]),
> K::REF(integer[4]),
> ALPHA::REF(complex[8]),
> A::ARRAY(1..M,1..K,order=Fortran_order,datatype=float[8]),
> LDA::REF(integer[4]),
> B::ARRAY(1..K,1..N,order=Fortran_order,datatype=float[8]),
> LDB::REF(integer[4]),
> BETA::REF(complex[8]),
> C::ARRAY(1..M,1..N,order=Fortran_order,datatype=float[8]),
> LDC::REF(integer[4]),
> LIB="libatlas.so" ):
>
> kernelopts(printbytes=false):
> with(LinearAlgebra):
>
> (st,ba,bu) := time(),kernelopts(bytesalloc),kernelopts(bytesused):
> (maxiter,N,d) := 10000,5,2:
> rho := Matrix(d^N,d^N,datatype=complex[8]):
> temp := RandomMatrix(d^N,d^N,outputoptions=[datatype=complex[8]]):
> alpha := 1.0+0.0*I:
> beta := 0.0+0.0*I:
> rtemp := ArrayTools:-ComplexAsFloat(temp):
> rrho := ArrayTools:-ComplexAsFloat(rho):
> for i from 1 to maxiter do
> zgemm("C","N",d^N,d^N,d^N,alpha,rtemp,d^N,rtemp,d^N,beta,rrho,d^N);
> od:
> time()-st,kernelopts(bytesalloc)-ba,kernelopts(bytesused)-bu;
5.016, 3800392, 64592136
>
> Norm(temp^%H . temp - rho);
0.
There is some lost overhead due to not calling any LinearAlgebra routine at all. Setting maxiter,N,d to 10000,0,2 resp. give performance for the methods of,
5.432, 4193536, 164662908
2.564, 3996964, 60722984
Setting maxiter to 5,10,2 gives performance for the two methods of,
31.737, 488022752, 661812396
11.404, 34006956, 34303948
This case, with the larger size of 1024x1024 Matrices, shows how much memory management is struggling. Memory fragmentation may be a cause.
Dave Linder
Mathematical Software, Maplesoft
I happened to already have a code fragment for this idea. The example below does 32x32 complex Matrix^%H-Matrix multiplication 10000 times.
This only works in Linux, but shows the idea right away of using zgemm (f06zaf) directly. The benefits are avoiding recreation of Matrix rho at each iteration, and avoiding actual transposition. The speedup for this example is about a factor of 2.
> # benchmark
> kernelopts(printbytes=false):
> with(LinearAlgebra):
>
> (st,ba,bu) := time(),kernelopts(bytesalloc),kernelopts(bytesused):
> (maxiter,N,d) := 10000,5,2:
> temp := RandomMatrix(d^N,d^N,outputoptions=[datatype=complex[8]]):
> for i from 1 to maxiter do
> rho := LinearAlgebra:-LA_Main:-MatrixMatrixMultiply(
> LinearAlgebra:-LA_Main:-HermitianTranspose(temp,inplace=false,
> outputoptions=[]),
> temp,inlace=false,outputoptions=[]);
> od:
> time()-st,kernelopts(bytesalloc)-ba,kernelopts(bytesused)-bu;
9.768, 4652204, 576723804
> # Using zgemm directly, in a brand new session
# zgemm does c <- alpha*a*b + beta*c (can transpose a or b)
# Some bug prevents datatype=complex[8] on ARRAY parameters,
# but this can be circumvented using ArrayTools:-ComplexAsFloat.
> zgemm := define_external( 'zgemm_',
> TRANSA::string[1],
> TRANSB::string[1],
> M::REF(integer[4]),
> N::REF(integer[4]),
> K::REF(integer[4]),
> ALPHA::REF(complex[8]),
> A::ARRAY(1..M,1..K,order=Fortran_order,datatype=float[8]),
> LDA::REF(integer[4]),
> B::ARRAY(1..K,1..N,order=Fortran_order,datatype=float[8]),
> LDB::REF(integer[4]),
> BETA::REF(complex[8]),
> C::ARRAY(1..M,1..N,order=Fortran_order,datatype=float[8]),
> LDC::REF(integer[4]),
> LIB="libatlas.so" ):
>
> kernelopts(printbytes=false):
> with(LinearAlgebra):
>
> (st,ba,bu) := time(),kernelopts(bytesalloc),kernelopts(bytesused):
> (maxiter,N,d) := 10000,5,2:
> rho := Matrix(d^N,d^N,datatype=complex[8]):
> temp := RandomMatrix(d^N,d^N,outputoptions=[datatype=complex[8]]):
> alpha := 1.0+0.0*I:
> beta := 0.0+0.0*I:
> rtemp := ArrayTools:-ComplexAsFloat(temp):
> rrho := ArrayTools:-ComplexAsFloat(rho):
> for i from 1 to maxiter do
> zgemm("C","N",d^N,d^N,d^N,alpha,rtemp,d^N,rtemp,d^N,beta,rrho,d^N);
> od:
> time()-st,kernelopts(bytesalloc)-ba,kernelopts(bytesused)-bu;
5.016, 3800392, 64592136
>
> Norm(temp^%H . temp - rho);
0.
There is some lost overhead due to not calling any LinearAlgebra routine at all. Setting maxiter,N,d to 10000,0,2 resp. give performance for the methods of,
5.432, 4193536, 164662908
2.564, 3996964, 60722984
Setting maxiter to 5,10,2 gives performance for the two methods of,
31.737, 488022752, 661812396
11.404, 34006956, 34303948
This case, with the larger size of 1024x1024 Matrices, shows how much memory management is struggling. Memory fragmentation may be a cause.
Dave Linder
Mathematical Software, Maplesoft
There is quite a bit of information on the topic of external-calling in the help-page, ?OpenMaple,C,Examples , and in the pages which it cross-references.
Dave Linder
Mathematical Software, Maplesoft
A reason for the relative slowness of the terser method you gave is that it forms a list of 1 million calls to rand(0..1), I believe. Those are only evaluated subsequent to the creation of the list.
Compare with this,
> ba,bu,st:=kernelopts(bytesalloc),kernelopts(bytesused),time():
> g := rand(0..1):
> N:= 10^6:
> R := [ seq( add( g(), k=1..12 )-6, i=1..N ) ]:
> kernelopts(bytesalloc)-ba,kernelopts(bytesused)-bu,time()-st;
11401176, 176104208, 10.168
So the memory allocation difference for using a hardware datatype Vector instead of a list, for R, isn't that much here.
Dave Linder
Mathematical Software, Maplesoft
A reason for the relative slowness of the terser method you gave is that it forms a list of 1 million calls to rand(0..1), I believe. Those are only evaluated subsequent to the creation of the list.
Compare with this,
> ba,bu,st:=kernelopts(bytesalloc),kernelopts(bytesused),time():
> g := rand(0..1):
> N:= 10^6:
> R := [ seq( add( g(), k=1..12 )-6, i=1..N ) ]:
> kernelopts(bytesalloc)-ba,kernelopts(bytesused)-bu,time()-st;
11401176, 176104208, 10.168
So the memory allocation difference for using a hardware datatype Vector instead of a list, for R, isn't that much here.
Dave Linder
Mathematical Software, Maplesoft
Thank you, Douglas,
We will be entering each issue in your Document into our database, and then examining them. I appreciate the time and effort you've taken in reporting all this.
I can say something about one of the issues. fsolve is set up generally to disallow mixing of operator form and expression form input. So, having defined,
h := x -> minimize(n*x^(1/n), n = 1 .. 3);
the subsequent call, fsolve(h-2), mixes those. Sure, 2 may act nicely as an operator, but in general such mixing could be unfortunate to allow. Now, you may wonder why the examples with,
f := x -> minimize(n*x, n = 1 .. 3);
g := x -> minimize(x^(1/n), n = 1 .. 3);
seem to work out. It's due to a behaviour of fsolve which I consider wrong in general, that it will try to apply g (or f) to some dummy local variable r. For your g and f, that application results in min(r,r^(1/3)) and min(r,3*r) respectively. For h, `minimize` does not return such a result of a single `min` call.
I'd like to improve the error message, "Can't handle expressions with typed procedures". But I'd also like to stop fsolve from that application of the functional input to dummy r. One can come up with examples where such application produces a wrong result. That would likely mean that your f and g examples would also generate an (improved) error message from fsolve. If that is worrisome, please consider the example fsolve(f-2-Pi). If allowed through, that mixed example would try to compute something like fsolve('f'(r)-2-Pi(r),r) which would be misguided.
best regards,
Dave Linder
Mathematical Software, Maplesoft
Hi Jacques,
Does the vim maple-mode file have macros in it?
I'm thinking of things like this:
- escaping to a new shell, which starts maple and pipes in the current buffer of maple source and lands one at the maple prompt.
- escaping to a shell and piping the current buffer's source into a maple session, without leaving one at the prompt, but while stripping any leading # mark off lines which continue with a savelib() call. You get the picture, I am sure.
I was wondering whether such macros might be put in as commented-out suggestions, in the vim maple-mode file.
The maple sessions started by such macros might take a few nice -b options, too.
Dave Linder
Mathematical Software, Maplesoft
This is useful Doug. By that I mean it's very useful to have straighforward examples which show what can be improved.
That routine, RealRange_union, dates from about 1993. The bits about infinity were added at some later date.
It's clear that you enjoyed analyzing it. Another thing worth examining might be whether the use of eval@subs is really necessary in RealRange_ends.
Dave Linder
Mathematical Software, Maplesoft
I'd like to mention alternative views to some of the statements being made in this thread. I may use quotes from one of DJKeenan's posts, but some of the sentiments appear to be held by some others, too.
First, there is this, "sets are unordered". It may be that sets are more useful mathematically when considered as unordered. (Even that is not set in stone. Like good notation, it may be considered correct if it leads to more insights in general.) But that is not the primary cause of why sets may appear to some people to be unordered in Maple. In fact, sets are ordered in Maple. They are ordered by the addresses in memory of the members. But the ordering of some given set is not constant across Maple sessions, as the memory addresses may differ across sessions.
A principal justification for maple sets to be ordered by member address is that it allows for good efficiency. There is an interpretion of the situation in which the efficiency gain is the primary rationale. Now, apparent lack of a single fixed order for any given maple set (which was explained as due to differences in actual orderings, taken across sessions) has sometimes been further rationalized by the notion of sets as being inherently mathematically unordered. But that is really quite different from the notion that sets in maple are unordered as a *direct* consequence of their being mathematically inherently unordered.
What are some consequences of set orderings' varying, you might ask? Well, results from indets() may vary. By vary I mean vary across maple sessions. Here are a few examples of what I believe are consequences of that:
> restart: -3: a: s := 11 - 17*v^2*w^3*x^4 - 3*a^2*w^3*x^4 + 1: lcoeff(s);
> # The above may produce either -3 or -17, if repeated over and over.
> restart: a: normal(1/(a-b));
> restart: b: normal(1/(a-b));
> # The first above always produces 1/(a-b) .
> # And the seond produces -1/(b-a), I believe.
> # This seems to mean that normal(1/(a-b)) can
> # vary, according to the address of a and b,
> # which can change according to which was
> # "used" first.
And what about the case where sets with ostensibly the same members are created locally within the same Library routine. What if garbage collection occurred between them. Can the members have different addresses from before? Does that mean that the set members may be accessed in a different order, which can in turn cascade to a completely different computation path (without any restart)?
It may be, then, that what appear to be several classes of effect are primarily caused by differing set orderings. There are other sources of irreproducibility in Maple, but the set issue may account for more than is generally imagined.
Now let's move on to this: "That order [of additions when `add` acts on a maple set of floats] is inherently undefined; so the add is inherently irreproducible, in principle." What's the justification for claiming that? Why could not the members be ordered in a manner reproducible across sessions? I take the "in principle" to mean that it could never be made to work in maple. But why would that be true? How is this case different from the general set order case described first above?
These kind of effects discussed here can result in different computation paths being taken, as has been noted in this thread by a few people. We do call these ordering problems. It can lead to intermittent failure to produce any answer. Sometimes it can lead to intermittently computing an incorrect result, which is far more serious. Ordering problems are not dismissed out of hand as being unimportant.
Dave Linder
Mathematical Software, Maplesoft
I'd like to mention alternative views to some of the statements being made in this thread. I may use quotes from one of DJKeenan's posts, but some of the sentiments appear to be held by some others, too.
First, there is this, "sets are unordered". It may be that sets are more useful mathematically when considered as unordered. (Even that is not set in stone. Like good notation, it may be considered correct if it leads to more insights in general.) But that is not the primary cause of why sets may appear to some people to be unordered in Maple. In fact, sets are ordered in Maple. They are ordered by the addresses in memory of the members. But the ordering of some given set is not constant across Maple sessions, as the memory addresses may differ across sessions.
A principal justification for maple sets to be ordered by member address is that it allows for good efficiency. There is an interpretion of the situation in which the efficiency gain is the primary rationale. Now, apparent lack of a single fixed order for any given maple set (which was explained as due to differences in actual orderings, taken across sessions) has sometimes been further rationalized by the notion of sets as being inherently mathematically unordered. But that is really quite different from the notion that sets in maple are unordered as a *direct* consequence of their being mathematically inherently unordered.
What are some consequences of set orderings' varying, you might ask? Well, results from indets() may vary. By vary I mean vary across maple sessions. Here are a few examples of what I believe are consequences of that:
> restart: -3: a: s := 11 - 17*v^2*w^3*x^4 - 3*a^2*w^3*x^4 + 1: lcoeff(s);
> # The above may produce either -3 or -17, if repeated over and over.
> restart: a: normal(1/(a-b));
> restart: b: normal(1/(a-b));
> # The first above always produces 1/(a-b) .
> # And the seond produces -1/(b-a), I believe.
> # This seems to mean that normal(1/(a-b)) can
> # vary, according to the address of a and b,
> # which can change according to which was
> # "used" first.
And what about the case where sets with ostensibly the same members are created locally within the same Library routine. What if garbage collection occurred between them. Can the members have different addresses from before? Does that mean that the set members may be accessed in a different order, which can in turn cascade to a completely different computation path (without any restart)?
It may be, then, that what appear to be several classes of effect are primarily caused by differing set orderings. There are other sources of irreproducibility in Maple, but the set issue may account for more than is generally imagined.
Now let's move on to this: "That order [of additions when `add` acts on a maple set of floats] is inherently undefined; so the add is inherently irreproducible, in principle." What's the justification for claiming that? Why could not the members be ordered in a manner reproducible across sessions? I take the "in principle" to mean that it could never be made to work in maple. But why would that be true? How is this case different from the general set order case described first above?
These kind of effects discussed here can result in different computation paths being taken, as has been noted in this thread by a few people. We do call these ordering problems. It can lead to intermittent failure to produce any answer. Sometimes it can lead to intermittently computing an incorrect result, which is far more serious. Ordering problems are not dismissed out of hand as being unimportant.
Dave Linder
Mathematical Software, Maplesoft