Mr. Roman Pearce

1678 Reputation

19 Badges

19 years, 141 days
Research Associate
Abbotsford, British Columbia, Canada

I am a research associate at Simon Fraser University and a member of the Computer Algebra Group at the CECM.

MaplePrimes Activity

These are answers submitted by roman_pearce

Maple 16 automatically parallelizes expand and divide, which are used by int, dsolve, and solve and a lot of other things.  However, most top level commands do a lot of other stuff which is not parallel, and there is still a lot of overhead in Maple, so you are unlikely to get parallel speedup for those commands in Maple 16.

The problems where you get parallel speedup now are ones that do a lot of multivariate polynomial arithmetic, e.g.:

f := randpoly([x,y,z],degree=8,dense):
g := randpoly([x,y,z],degree=8,dense):

On my machine (Core i5 750) if I set kernelopts(numcpus=1): it takes 24.4 seconds real time on 1 core.  By default it uses up to 4 cores and takes 8.58 seconds, so the real parallel speedup of Maple 16 is 2.84x.  This is actually pretty good, given that only multiplication and division were parallelized (for sufficiently large input) and there are a lot of other overheads.

Diff is normally linear time so I don't think that can be parallelized.  It's pretty hard to parallelize any operation that takes less than a millisecond, so some of the high level algorithms such as solve and dsolve will themselves have to be paralleized.

Maple parallelizes large polynomial multiplications as of Maple 14, and divisions as of Maple 16.  The individual operations may not scale to 12 cores however, it depends on the size of the polynomials.  The resultant algorithm itself does not appear to be parallelized as of Maple 16.  Parallel speedup also depends on what resultant algorithm is used.  Could post the polynomials or upload them to MaplePrimes?  I can take a look at it.

If you make a variable x[1], then it is a table and also type/polynom.  This is "last name evaluation", where the table/procedure/matrix evaluates to a name, not to a data structure.  And names are of type polynom.

The Maple prompt is also used for "greater than", and it does not make sense in this context.  Try the following:

plots[animate3d](sin(theta)*cos(z*t), theta=1..3, z=1..4, t=1/4..2/7, coords=cylindrical);

This approach will not be feasible unless you supply enough clues to subtantially knock down the system of polynomial equations.  Otherwise the solution space is huge, and the time to compute a Groebner basis may be exponential in the number of solutions.  If you can formulate the problem in terms of polynomials with lower degree and/or fewer variables, that can also give a big improvement.

for i from 0 to 15 do
p || i := plot(sin(x*t),x=-10..10,axes=none):

Declare a hardware datatype if possible.  Use elementwise operations if possible.

n := 10^6:
A := RandomVector(n,datatype=float[8]):
B := RandomVector(n,datatype=float[8]):
C := CodeTools:-Usage(A /~ B):
n := 10^6:
A := RandomVector(n,generator=1..100,datatype=integer):
B := RandomVector(n,generator=1..100,datatype=integer):
C := CodeTools:-Usage(A /~ B):

algsubs(N=0, N+k);

It can handle more complicated equations.

To change the input style, under Windows or Linux go to "Tools -> Options".  On the Mac it is under "Maple 16 -> Preferences".    Under the Display tab, change Input Display to Maple Notation and click "Apply Globally".  Under the interface tab you can also make Maple create new worksheets (with prompts) instead of documents (free form) by default.

The number pi in Maple is written Pi, however it is not evaluated to a floating point number by default.  This is important for symbolic math.  To evaluate numerically, use the evalf command, i.e. evalf(Pi); will give you the default 10 digits.  You can even assign pi := evalf(Pi); and use pi everywhere instead, but you may lose some power in symbolic routines.

For converting to C or Matlab, you should make your code into a multi-line procedure and then convert that to a C function or whatever.  For example:

foo := proc(n) local t, i;
t := 0;
for i from 1 to n do
t := t + i;
end do:
end proc:

Here's my version. Maybe someone wants to make a command out of this.

m := 10:
R := sum(binomial(n+i*d,n),i=1..k-1);
S := binomial(n+k*d,n);
f := unapply(R/S, n, d, k);
A := Array(1..m,1..m,1..m,f,datatype=float[8]);

foo := proc(i,j,k,v)
end proc:

PLOT3D(seq(seq(seq(foo(i,j,k,A[i,j,k]/m), i=1..m), j=1..m), k=1..m),AXES(BOXED));

volume density plot

I forgot to answer your second question: "Are there any symbolic procedures that use the GPU in Maple (or are there any known plans for the future versions)?"

Currently there is not, however the group at the University of Western Ontario is using GPUs for polynomial system solving and polynomial operations mod p, which underlies many algorithms in Maple.  Obviously there is a lot of interest in GPUs.  The problem is that to reach a sufficient number of customers you really want to use single precision.  These have a 23-bit mantissa, which means you can only multiply 11-bit numbers without overflow.  Unless an alternative technique is developed, you're building up exact results 11-bits at a time on a GPU, when five lines of assembly code would let you build up 64 bits at a time on an x86 processor.  Worse, in many algorithms 11-bits is not enough and you need double precision, which is 8-24x slower to get only twice as many bits at once.  For many exact calculations CPUs are still hard to beat.  This may change though.

GPUs have much more potential in numerical code because you can use the full 23-bits and on well-conditioned problems that is often enough.  Then you iterate to build up the precision to whatever arbitrary level you want.  For systems like Maple which *are used for* arbitrary precision these algorithms make a lot of sense.

I wouldn't shell out the bucks for this.  Maple's CUDA support is too rudimentary.  Unless you're writing C code it's probably not worth it.  Matlab + Jacket is the best alternative to C.

For graphics cards in general, the big question is "do you need double precision?"  If you have an interative algorithm that can run in single precision and build up a result then buy the gamer card.  It might even come with a free game!  This is the best way to program GPUs IMHO because single precision will always be fast and double precision will always be a lower priority in GPU design.

If you need double precision support, then it gets tricky.  NVidia wants to sell you a Tesla which is the smoking supercomputer thing.  GPUs have a ratio of double precision throughput to single precision throughput.  So then you want a chip with a good ratio.  The original GF100 Fermi chip had 1:8, but newer chips (while faster) have a worse ratio.  Queue an hour of looking through wikipedia to figure this out.

cat(seq(sprintf("%a ",i),i=L));

The mac init file is .mapleinit in your home directory.  For libname you would want to add "/Users/yourusername/Mylib"

If you take the lexdeg basis and remove all polynomials with [x[9],x[10],x[11],x[12]] you will get a Groebner basis for the ideal interesected with Q[x[1],x[2],x[3],x[4],x[5],x[6],x[7],x[8]] in tdeg order.

2 3 4 5 6 7 8 Last Page 4 of 19