roman_pearce

sdmp on GPU...

Commented: roman_pearce 1703

May 05 2010

I use threads directly, with primitives for memory barriers, compare and swap, and short sleeps. We went for maximum parallel performance and left nothing to chance. We split up the polynomials manually and multiply with all cores running at full tilt. The threads cooperate to combine the results, and use a load-balancing scheme with no communication overhead. The whole thing runs in the CPU cache. It's a true parallel algorithm designed for many-core processors. It is currently compiled for multicore cpus, which use fewer threads due to higher scheduling latency and faster processing versus communication. You probably couldn't port these algorithms to a GPU. They're sparse algorithms which inherently involve a lot of branching, like heapsort. GPUs expect fairly uniform computations over independent sets of data, so it just isn't a match. GPUs would be great for dense polynomial arithmetic though.

sdmp source...

Commented: roman_pearce 1703

May 05 2010

Thanks for your kind comment. The source is not available, and probably won't be anytime soon. It's still under heavy development, and every year I take an axe to it to simplify the code and do something new. There is no programmer interface exposed in Maple. Instead, we have tried to describe all the details in the papers. I'm happy to answer further questions or provide tips to anyone working on this. In Maple 14 we hooked the routines into expand and divide, but we did a bit more than that. We lowered the thresholds for the kernel to call out to `expand/bigprod` and `expand/bigdiv` where our code is used. And we wrote code for `expand/bigprod` so that polynomials with rational coefficients, functions such as sin(x), and even rational function coefficients would run our code. These extra steps were critical to maximize the performance, and you won't get nearly the same improvement (e.g. in factor) from simply hooking our code into Maple 13. It's a finished product. Here's another example from the Trip paper to illustrate. I can't supply times since the beta license expired and we're waiting for the official release to be installed in our lab, but the improvement from Maple 13 to 14 is massive, and Maple 14 is the fastest system out there that I know of.

f := expand((1/3*x + 3/5*y + 5/7*z + 7/11*t)^20): g := f+1:
gc(): TIMER := time[real](): p := expand(f*g): time[real]()-TIMER;
gc(): TIMER := time[real](): divide(p,f,'q'):  time[real]()-TIMER;

post import...

Commented: roman_pearce 1703

May 01 2010

Are we going to have another round of importing posts after the new site launches?

adding to lists...

Commented: roman_pearce 1703

May 01 2010

Multiplying by a constant is there for linear algebra. It's a fairly basic thing to scale a vector, and it's used a lot in algorithms. I guess nobody thought that translation was similarly important.

Algebraic:-RecursiveDensePolynomials...

Answered: roman_pearce 1703

April 29 2010

0 0

They're internal functions which you should avoid unless you need them or want to do things directly.

Algebraic:-RecursiveDensePolynomials...

Commented: roman_pearce 1703

April 29 2010

They're internal functions which you should avoid unless you need them or want to do things directly.

much easier mod p...

Answered: roman_pearce 1703

April 29 2010

0 0

Then it's a much easier problem: alias(a=RootOf(1+z^31+z^23+z^11+z^7+z^2+z^32)); Gcdex(x^32,x^32+(a^15+a^11+a^6)*x^31+(a^13+a^10+a^3)*x^23+(a^11+a^7+a^2+1)*x^11+x^7+a^13+a^10+a^3+a^1+1,x,'s','t') mod 103;

much easier mod p...

Commented: roman_pearce 1703

April 29 2010

Then it's a much easier problem: alias(a=RootOf(1+z^31+z^23+z^11+z^7+z^2+z^32)); Gcdex(x^32,x^32+(a^15+a^11+a^6)*x^31+(a^13+a^10+a^3)*x^23+(a^11+a^7+a^2+1)*x^11+x^7+a^13+a^10+a^3+a^1+1,x,'s','t') mod 103;

large sparse structured matrices...

Answered: roman_pearce 1703

April 25 2010

0 0

Your matrices have symbolic entries. That's very different from software that uses sparse structured algorithms on floating point matrices, because you are going to get large polynomials. Maple can solve 100000 x 100000 sparse structured linear systems too. Wanna see? Download and save: http://www.cecm.sfu.ca/~rpearcea/sge/largesys read "largesys": # load it in infolevel[solve] := 5: sol := SolveTools:-Linear(eqns,vars): That system is fairly easy, harder systems will take a few minutes. Eigenvalue problems are also often harder than solving.

large sparse structured matrices...

Commented: roman_pearce 1703

April 25 2010

Your matrices have symbolic entries. That's very different from software that uses sparse structured algorithms on floating point matrices, because you are going to get large polynomials. Maple can solve 100000 x 100000 sparse structured linear systems too. Wanna see? Download and save: http://www.cecm.sfu.ca/~rpearcea/sge/largesys read "largesys": # load it in infolevel[solve] := 5: sol := SolveTools:-Linear(eqns,vars): That system is fairly easy, harder systems will take a few minutes. Eigenvalue problems are also often harder than solving.

make a set, sort it...

Answered: roman_pearce 1703

April 20 2010

0 0

f := a00*x^123+a45*x^233+a02*x^123+a67*x^156+a47*x^67; C := [coeffs(f,x,'M')]: # coefficients M := [M]: # monomials S := [seq([C[i], M[i], degree(M[i])], i=1..nops(C))]: S := sort(S, (a,b)->evalb(a[3] < b[3]));

make a set, sort it...

Commented: roman_pearce 1703

April 20 2010

f := a00*x^123+a45*x^233+a02*x^123+a67*x^156+a47*x^67; C := [coeffs(f,x,'M')]: # coefficients M := [M]: # monomials S := [seq([C[i], M[i], degree(M[i])], i=1..nops(C))]: S := sort(S, (a,b)->evalb(a[3] < b[3]));

It assigned the coefficients...

Answered: roman_pearce 1703

April 16 2010

0 0

It assigned the coefficients of f to C, so C[1] is the coefficient for the monomial M[1], etc.

It assigned the coefficients...

Commented: roman_pearce 1703

April 16 2010

It assigned the coefficients of f to C, so C[1] is the coefficient for the monomial M[1], etc.

That's the maximum number of...

Answered: roman_pearce 1703

April 09 2010

0 0

That's the maximum number of digits (base 10), which is about a 100MB integer. Not bad for a 32-bit machine. On a 64-bit machine the limit is 15GB.

E-Mail Address:
Password:
Remember Me:	Automatically sign in on future visits

E-Mail Address:
Password:
Remember Me:	Automatically sign in on future visits

Ask a Question

Create a Post

1703 Reputation

19 Badges

MaplePrimes Activity

These are replies submitted by roman_pearce

sdmp on GPU...

sdmp source...

post import...

adding to lists...

Algebraic:-RecursiveDensePolynomials...

Algebraic:-RecursiveDensePolynomials...

much easier mod p...

much easier mod p...

large sparse structured matrices...

large sparse structured matrices...

make a set, sort it...

make a set, sort it...

It assigned the coefficients...

It assigned the coefficients...

That's the maximum number of...

Save this setting as your default sorting preference?

Ask a Question

Create a Post

Generating PDF…

Save this setting as your default sorting preference?
Note: You can change your preference any time in your account settings
Don't show this again

From:
To:

Custom Message (optional):