Mr. Roman Pearce

## 1678 Reputation

19 years, 140 days
CECM/SFU
Research Associate

I am a research associate at Simon Fraser University and a member of the Computer Algebra Group at the CECM.

## @Kamel  If you have a system of lin...

If you have a system of linear equations you can solve it directly without constructing a matrix first:

`vars := {seq(cat(x,i), i=1..100)}:`
`eqns := {seq(randpoly(vars,degree=1,terms=20),i=1..100)}:`
`sol := SolveTools:-Linear(eqns,vars):`

Actually current versions of Maple will convert your matrix to equations when you call LinearSolve.  This is for solving over the rational numbers.  I don't think Matlab does that by default, only in the symbolic toolbox.  For solving with floating point numbers Maple and Matlab should be about the same speed for dense systems.  To do that in Maple, make sure you convert the matrix and vector to datatype=float[8] as per my other post.

## @Kamel  If you have a system of lin...

If you have a system of linear equations you can solve it directly without constructing a matrix first:

`vars := {seq(cat(x,i), i=1..100)}:`
`eqns := {seq(randpoly(vars,degree=1,terms=20),i=1..100)}:`
`sol := SolveTools:-Linear(eqns,vars):`

Actually current versions of Maple will convert your matrix to equations when you call LinearSolve.  This is for solving over the rational numbers.  I don't think Matlab does that by default, only in the symbolic toolbox.  For solving with floating point numbers Maple and Matlab should be about the same speed for dense systems.  To do that in Maple, make sure you convert the matrix and vector to datatype=float[8] as per my other post.

## exact or floating point...

Maybe your matrix is exact instead of floating point.  The following commands will convert everything to hardware floating point numbers before solving:

`A := Matrix(A,datatype=float[8]);`
`b := Vector(b,datatype=float[8]);`
`LinearAlgebra:-LinearSolve(A,b);`

Without floats the solution can get extremely large, although Maple has good algorithms for computing it:

`with(LinearAlgebra):`
`A := RandomMatrix(100,100);`
`b := RandomVector(100);`
`x := LinearSolve(A,b);`
`x[1];  # big rational number`

## exact or floating point...

Maybe your matrix is exact instead of floating point.  The following commands will convert everything to hardware floating point numbers before solving:

`A := Matrix(A,datatype=float[8]);`
`b := Vector(b,datatype=float[8]);`
`LinearAlgebra:-LinearSolve(A,b);`

Without floats the solution can get extremely large, although Maple has good algorithms for computing it:

`with(LinearAlgebra):`
`A := RandomMatrix(100,100);`
`b := RandomVector(100);`
`x := LinearSolve(A,b);`
`x[1];  # big rational number`

## algorithms on gpu...

The design of GPUs is not a good fit for symbolic computation in general, however a large number of computer algebra algorithms could be accelerated on the GPU.  Here's a nice method for exact linear algebra: An algorithm to solve integer linear systems exactly using numerical methods, Zhendong Wan

A lot of other computations can use the Chinese Remainder Theorem: Fast polynomial multiplication on a GPU, Marc Moreno Maza and Wei Pan

The GPUs are already moving onto the chip so in the future I do not expect a significant overhead to "transfer data to the GPU".  One way or another that cost is going away, probably with some kind of cache.

As for floating point, unless you have a Tesla card the double precision performance is not going to be good.  It will always make sense for GPU manufacturers to cut double precision performance on consumer parts to decrease power and price and make the chip a better value for games.  That's the unmistakable conclusion from NVidia's Fermi, and AMD didn't even try.  So adaptive algorithms, which run in single precision and iterate to improve the precision are the way to go.  We should invest in those.

## algorithms on gpu...

The design of GPUs is not a good fit for symbolic computation in general, however a large number of computer algebra algorithms could be accelerated on the GPU.  Here's a nice method for exact linear algebra: An algorithm to solve integer linear systems exactly using numerical methods, Zhendong Wan

A lot of other computations can use the Chinese Remainder Theorem: Fast polynomial multiplication on a GPU, Marc Moreno Maza and Wei Pan

The GPUs are already moving onto the chip so in the future I do not expect a significant overhead to "transfer data to the GPU".  One way or another that cost is going away, probably with some kind of cache.

As for floating point, unless you have a Tesla card the double precision performance is not going to be good.  It will always make sense for GPU manufacturers to cut double precision performance on consumer parts to decrease power and price and make the chip a better value for games.  That's the unmistakable conclusion from NVidia's Fermi, and AMD didn't even try.  So adaptive algorithms, which run in single precision and iterate to improve the precision are the way to go.  We should invest in those.

## post worksheet...

Can you upload a worksheet or something that we can run?

## It would be nice to do this automaticall...

It would be nice to do this automatically, not just for Pi but for any case with non-numeric endpoints.

## junk DNA...

We should run this on a genome.  I wonder if it would detect "junk DNA".

## Most people don't like lots of updates -...

Most people don't like lots of updates - it's a hassle.  They also don't like to buy 2 year old software products.

## heapsort...

It's hard to beat a good heapsort on small data that fits in L1.

@marvin I did file a bug on Threads:-Add efficiency, with a simple example where linear speedup should be achieved.  I also harassed Darin by email.  Until this example works with nearly linear speedup, you have very little chance of being able to parallelize your code.

```N := 10000:
L := [seq(rand() mod 101,i=1..N)]:
bar := proc(L) local foo, j;
foo := proc(L,j) local i;
end proc:

end proc:
bar(L):```

I get 2.75x on a Core i5 750, 3x on a Core i7 920, and 2.72x on a Core2 Q6600.  Note that turbo boost messes these numbers up because the i5/i7 runs faster with only one thread.

For the i5 750, it's reported speed is 2.66 GHz, but it can actually run four threads at 2.8 GHz and one thread at up to 3.2 GHz, thermal envelope permitting.  So "fair speedup" is 1/(1-(3.2-2.8)/2.8) = 16.67% higher, or 3.2x if you measure from a cold start.

For the i7 920, it's reported speed is 2.66 GHz, but it can run four threads at 2.8 GHz and one thread at up to 2.93 GHz.  So "fair speedup" is 1/(1-(2.93-2.8)/2.8) = 4.87% higher, or about 3.14x if your cpu is cold.

You can see how easily this turns into a guessing game.  The best option is to disable turbo boost in the bios to test the efficiency of parallel programs, keeping in mind that "what the user sees" is what matters, fair or not.

Anyways, that example should run with linear speedup.  Until it does, it will be hard to parallelize anything using Threads:-Add.

One option is this: for N elements, create K = N^(1/3) tasks.  Give each task K strips of K elements, cycling between the tasks.  For example, for N=10^6, K=100 and we use 100 tasks.  The 1st task gets 1..100, the 2nd task gets 101..200, and so on, in a pattern that repeats every 10000 elements.  This effectively assumes (for all N) that parallel speedup should be achieved and creates a modest number of tasks so that overhead is low.  A really expensive function with N=8 can be parallelized.  It should also evenly distribute the work, even when that grows quadratically as in the example above.

## get mouse clicks in a program...

Is there a way to set up an interactive plot and capture mouse actions in a program?  I'm thinking of a Maple command which would allow the user to draw and manipulate graphs (i.e.: graph theory).

## get mouse clicks in a program...

Is there a way to set up an interactive plot and capture mouse actions in a program?  I'm thinking of a Maple command which would allow the user to draw and manipulate graphs (i.e.: graph theory).

 First 6 7 8 9 10 11 12 Last Page 8 of 39
﻿