:

## Maple 17: High Performance Polynomials

Maple

As described on the help page ?updates,Maple17,Performance, Maple 17 uses a new data structure for polynomials with integer coefficients. Our goal was to improve the performance and parallel speedup of polynomial algorithms that underpin much of the system and create a platform for large scale polynomial computations. Shown below is the new representation for 9xy3z - 4y3z2 - 6xy2z - 8x3 - 5 in Maple 17. The monomials are encoded as machine integers to allow for fast monomial operations. On a 64-bit machine the first monomial would be 5 · 248 + 1 · 232 + 3 · 216 + 1 · 20. This encodes the total degree and the exponents of [x,y,z] using 16 bits each. We can multiply two monomials in the same variables by adding machine integers, or compare them in graded lexicographical order by comparing the integers. The degree of each variable can be extracted using a bit mask.

The terms of the polynomial are sorted to assist in writing fast algorithms. For example, to compute the degree of a polynomial we need only look at the first term. To add two polynomials in the same variables, we use a merge which is linear time. To search for the subexpression z5, as in has(f, z5), we locate the first monomial with total degree 4 using binary search and do a linear scan for z5 down to that point. Compared to Maple's generic sum-of-products data structure shown above, the new structure is highly compact. For polynomials in n variables we save approximately a factor of n+1 in space. We avoid creating separate objects for each term that would be tracked and accessed at non-contiguous memory locations. The new algorithms in the Maple kernel are also coded in a bitwise programming style, with minimal branches and loops, so as to execute quickly on modern processors. Almost all of the algorithms are linear time or better.

The first consequence of all of this optimization is shown in the screenshot below, taken on a quad core AMD Phenom II N950 notebook. Maple has used parallel algorithms for polynomial multiplication and division for several releases now, but their effect on top level commands has been diminished by data structure overhead. No longer. The factor command, which is sequential Maple code, now achieves free parallel speedup of about 2x. The computation in Maple 17 is about 3.5 times faster than in Maple 16 on 4 cores, and 75 times faster than Factor in Mathematica 9. The screenshot also shows how we do not waste memory or cpu time to gain parallelism. The sequential algorithms which are used when kernelopts(numcpus=1); have virtually identical cost. Right now Maple 17 is a state of the art computer algebra system. It can do problems that no other software can do, and it can make some very large computations look easy. The example below runs the extended Euclidean algorithm on two polynomials with 15 million terms. You will need a 64-bit machine.

```f := mul(randpoly(i,degree=31,dense),i=[w,x,y,z]):
g := mul(randpoly(i,degree=31,dense),i=[w,x,y,z]):
h := mul(randpoly(i,degree=31,dense),i=[w,x,y,z]):
A := CodeTools:-Usage(expand(f*g)): nops(A);
B := CodeTools:-Usage(expand(g*h)): nops(B);
g := CodeTools:-Usage(gcdex(A,B,x,'s','t')):
```

We are very excited by this new technology and what allows us to do, today and in the future. But we have no idea what others might think to do. We hope you will try Maple 17 and report back to us. It's a big upgrade. ﻿