acer

32722 Reputation

29 Badges

20 years, 85 days
Ontario, Canada

Social Networks and Content at Maplesoft.com

MaplePrimes Activity


These are replies submitted by acer

Could you post the code here, by uploading it in a file?

If you aren't at liberty to show the code in public then maybe you could send it to Maplesoft's technical support.

acer

@marvin Presumably you are talking about hardware precision float Matrices and Vectors (float[8] or complex[8] datatypes) since you mention alternatively doing it on CUDA.

Are you sure that the external MKL (Intel Math Kernel Library) is not already making use of all or most of the multiple cores when doing each of the individual Matrix-Vector multiplications? If that were the case, then not much more speedup might even be possible. (You might check it most simply by using the Task Manager, to see the total load.)

Investigation with compiler utilities reveals that dgemm and dgemv are not always called directly from Maple. Instead, an external library linalg.dll is called from Maple. (Just how, seems to depend on whether one uses `.` or MatrixMatrixMultiply.) And linalg.dll is linked against nag.dll. And the Library-side define_external may not have the option to enable multiple Threads to concurrently access that external library. There may be thread-safety issues here, related to MKL.

But if the MKL is already using multiple cores, then perhaps thing are already near optimal for your example. As a rough general rule, the lower-level the threadedness the better. Especially for array calculations where cache might be shared. Accelerated BLAS functions gain as much by virtue of being highly cache-tuned as they do by being threaded for 4-8 cores. (This touches a bit on another excellent post by Darin.)

I realize that you mention that example because it seems like a good test: easily parallelized across the list of Vectors. But in principal it would be suboptimal to storing all the Vectors in a single Matrix, so as to perform Matrix-Matrix multiplication. If nothing else this would avoid the overhead cost of `n` Maple function calls, which are relatively expensive compared to C function calls. But it also allows for a single call  to dgemm to be more cleverly cache-tuned than the overall task of `n` dgemv calls. Apologies if you considerd that -- as mentioned, yours is likely intended as a test example of the Threading functionality. But it may not be a clean example, on practical considerations.

In summary, I see at these considerations:

  • MKL may already be using multiple cores for each Matrix-Vector multiplication
  • multiple calls to linalg.dll may not be allowed concurrently (MKL higher-access thread-safety?)
  • `n` Matrix-Vector calls adds Maple overhead
  • `n` dgemv calls subverts cache-tuning specific to dgemm
  • `n` dgemv calls may subvert any Strassen or other superior dgemm algorithm

Changing topic slightly, things are really going to get interesting (I think) when machines commonly get hundreds of cores. Consider the case of solving a (non-modular) integer linear system using modular techniques. Sure, for each prime modulus a particular external call might use lots of cores, but perhaps not with linear speedup. Or maybe there are a huge number of cores available. At some point, it could be desirable to perform the modular LA operations in parallel, even though each one uses many cores. There would be enough cores that one would actually want to apportion them out to many concurrent modular subtasks to be run concurrently. In such a state of affairs it would be desirable to "fix" or work around the second bullet point above.

Let's remain calm, as things are as they were yesterday and the day before that...

There are two situations to be careful about. The first is computing with Digits<5. The solution is simple: don't do it. Don't assign Digits<5. Similarly, don't do evalf[d](...) or evalf(...,d) for d<5. This is for all sorts of expressions, both compound and simple.

Now for the second situation. For compound expressions, yes, one must be careful of roundoff error. That is true for any system with a working precision that is fixed (constant, or at best fixed per computation), and number of guard digits that is limited (constant, or at best limited according to the working precision). It is as true for a great deal of Maple as it is for, oh say, most of Matlab and compiled C/Fortran, etc.

Here is an example.

> restart:

> S:=eval(exp(abs(tan(x)-x)),x=1/10^4)-1;
                     /   /  1  \     1  \    
                  exp|tan|-----| - -----| - 1
                     \   \10000/   10000/    

> seq(evalf[d](S),d=10..15); # mostly all wrong, and only partly slightly correct
                                   -13        -13
               0., 0., 0., 0., 3 10   , 3.3 10   

> Chris2222evalf:=proc(a,b)
>   evalf(evalf(a,b+2),b)
> end proc:

> seq(Chris2222evalf(S,i),i=10..15); # just as bad, really (seriously)
                  -13        -13         -13          -13
      0., 0., 3 10   , 3.3 10   , 3.33 10   , 3.333 10   

> evalf[23](S): evalf[10](%); # thank you very much...
                                     -13
                       3.333333347 10   

The extra tough question is: how high do you have to raise the working precision X (including allowing a set number of guard digits Y) in order to certifiably attain a requested number Z of correct digits? For arbitrary expressions involving transcendental functions this is a very hard question. How could one ascertain in advance and without doing as much work as the actual computation that at least Digits=23 is required in order to produce 10 correct digits in the result? What about for arbitrary compound expressions, with arbitrarily large exact coefficients thrown in (for fun)?

A very small part of Maple will try to raise Digits arbitrarily high in order to compute a floating-point solution to a problem. `signum` and `is` will generally not do that, and they often rely on `evalf` to gauge relative magnitudes, sort, or ascertain sign. The routine `fsolve/polyill` will do it, by Digits-doubling until it gets as many roots as it expects, but there are examples known to make it run until hitting the Digits upper bound. Even `evalr` and `shake` do not really do it.

Mathematica's behaviour is different. It appears to strive to achieve a supplied accuracy, even by raising the internal working precision as high as it needs(?). But it is a closed system, and the internals and the workings of its numeric scheme are not public. UC Berkeley Professor Richard Fateman has several times taken Mathematica to task for their numeric model. A lot of it is on sci.math.symbolic for public consumption. (1 and 2 are in two such threads)

Another link, for your amusement/edification, by Dave Rusin: calc_errors

With no intended offense, I would suggest that even in High School the basics of roundoff error could be introduced in any course which did floating-point computations. It needn't be advanced, as a simple example can at least demonstrate that the sand underfoot is shifting.

restart:
Digits:=15: # something similar for the students calculators should work
a:=1.0: b:=3.0*10^(-21): (a + b) - a; 0. (a - a) + b; -21 3.00000000000000 10

acer

@hirnyk I'm not sure that I understand. My construction is very similar to your own with procedure f. The only significant difference is that I inlined the proc right into the Matrix() call, while you assigned it first outside that call. And I used an operator (which is just a simple form of a procedure) while you used a general procedure.

The ?Matrix help-page describes that as `init` in its Calling Sequences, mentioning that it may be a procedure (as opposed to list of lists or other valid objects). The ?Matrix help-page has this one in its Examples section (again, differeing only by pre-assignment of operator f),

f:= (i,j) -> x^(i+j-1):
Matrix(2,f);

@hirnyk I'm not sure that I understand. My construction is very similar to your own with procedure f. The only significant difference is that I inlined the proc right into the Matrix() call, while you assigned it first outside that call. And I used an operator (which is just a simple form of a procedure) while you used a general procedure.

The ?Matrix help-page describes that as `init` in its Calling Sequences, mentioning that it may be a procedure (as opposed to list of lists or other valid objects). The ?Matrix help-page has this one in its Examples section (again, differeing only by pre-assignment of operator f),

f:= (i,j) -> x^(i+j-1):
Matrix(2,f);

@Christopher2222 Isn't the point made above that if this is 2D Math input then only the restart is happening for the single-line case. If nothing else on that single line is executed (assignment to a, or the gc call) then that could explain why it doesn't affect the memory use at that time -- things which don't happen have little effect.

@Alejandro Jakubi Thank you, sir. :)

@Alejandro Jakubi Thank you, sir. :)

@Christopher2222 As I mentioned above in a note,

    NB. Regarding your earlier talk about parse, the idea of parsing strings containing spaces,
    to rely on 2D implicit multiplication, looks dubious at best. What about "this 1is 2how $I like it, eh"
    or "some are 1-1 and  some are many-1".

It looks so much safer to use string inspection tools to look into strings, instead of parsing them and hoping for valid products of (valid, Maple) names. If you buy into that idea, then you wouldn't be trying to parse the strings at all, and would not run amok of control characters messing up the parsing.

@Christopher2222 As I mentioned above in a note,

    NB. Regarding your earlier talk about parse, the idea of parsing strings containing spaces,
    to rely on 2D implicit multiplication, looks dubious at best. What about "this 1is 2how $I like it, eh"
    or "some are 1-1 and  some are many-1".

It looks so much safer to use string inspection tools to look into strings, instead of parsing them and hoping for valid products of (valid, Maple) names. If you buy into that idea, then you wouldn't be trying to parse the strings at all, and would not run amok of control characters messing up the parsing.

StringTools:-Has doesn't work like you are expecting. It matches each character. As mentioned above , use Search or Split it, or something else easier.

If you are planning on using select then that replaces one of your map's. Don't waste time mapping a predicate to get to true/false entries; use the predicate in just the select call!

Don't waste time mapping every entry to a string, it's unnecessary. Just wrap the predicate into a conditional like t->if(type(t,string),..one thing,..other thing). (sorry, I have no single left quote for that if operator call right now.) Or get creative and act over indets(t,string). Or whatever. But don't change everything to a string.

acer

StringTools:-Has doesn't work like you are expecting. It matches each character. As mentioned above , use Search or Split it, or something else easier.

If you are planning on using select then that replaces one of your map's. Don't waste time mapping a predicate to get to true/false entries; use the predicate in just the select call!

Don't waste time mapping every entry to a string, it's unnecessary. Just wrap the predicate into a conditional like t->if(type(t,string),..one thing,..other thing). (sorry, I have no single left quote for that if operator call right now.) Or get creative and act over indets(t,string). Or whatever. But don't change everything to a string.

acer

@maple How are you storing the solutions of each fsolve call? (That was also asked above.) Is your code set up to that Maple must hold on to them all? Can you upload the code, so that we may see it?

If the numbers for the 2-Dimensional Badge are correct, then to date (25/07/2010) only three members have used 2D Math in five or more posts since the new version 2 of Mapleprimes went online in May. Doesn't that mean that the regular responders are avoiding it in Answers and Comments?

My biggest problems with the worksheet-upload method of posting 2D Math (as opposed to using the Editor button) are that it takes too much effort  and that the resulting images don't contain 1D equivalents in alt-tags so one cannot copy and paste them directly (without having to launch the GUI... and wait... and then use the mouse some more...). Grabbing some inlined, posted 2D Math and then putting it back into a reply, prettily rendered, is a lot of work.

2D Math in Mapleprimes posts could be useful if it made the math easier to see (for which, it'd need to be much better rendered, to be sure), and it were easy to post and grab and repost without using the Maple Standard GUI.

acer

Why waste cycles converting all of a to strings? As shown above, it is not a necessary step in order to get all the original string entries to be parsed.

And besides, how does your followup question make sense? What do you expect to happen, when "know how" is parsed? Do you expect no error, and if so, why?

The problem doesn't appear to be with the suggested parsing command, but with your expectation that anything be able to parse it in a `usual` sense.

Do you have some special sense, unstated as yet, of what "know how" should be converted to?

acer

First 459 460 461 462 463 464 465 Last Page 461 of 599