marvin

21 Reputation

7 Badges

20 years, 49 days

MaplePrimes Activity


These are replies submitted by marvin

My real need for CUDA alongside Maple is to do double precision matrix manipulations.  What I want to do is speed up the construction of large matrices of type float[8].  For that I need to know how you get CUDA to do double-precision and how to interface Maple to that sort of thing.  I would love a Windows example that shows me how to do something like

1.  foo:=Matrix(1700,1700,(i,j)->myfunc(i,j,stuff....),datatype=float[8], shape=hermitian (or symmetric))

where myfunc is an external routine that I have compiled.

OR

2. M:=Matrix(1700,1700,datatype=float[8]);
    foo:=MakeMatrix(args,M);

where MakeMatrix fills the matrix M using an external CUDA based function MakeMatrix().

It would be really helpful to know how to implement something real world like this.

 

Best,

Marvin

Since I am planning on actually using this soon on my new machine I would love a windows version.

So far as I can see the toolbox is good if you want to run a bunch of trivially parallelizeable processes.  If you want to do what a multi-threaded program does, i.e. transparently access the same data (for something like my matrix problem) it looks like that takes work.  It also looks like you have to pass around a bunch of stuff.  I can be wrong about this, I haven't really played with it yet, but I think I more or less have the correct idea. 

I am even less familiar with whether it can be used to speed up Maple Sim iterations.  But given that it isn't so easy to use the same data I am not sure it would be a win.

I think I understand.  For my case then the little bit of overhead would be trivial.

Thanks for your initial comments.

What I did was set infolevel[Compiler]:=4;

Then I compiled something trivial.  That got me the compile and link and postlink strings I needed.  Suitably modifying them I got the compiler to generate a .dll of my choosing.  From there I can start playing with the parallel code I want to test and break Maple.  Of course, these crashes won't be bug reports, just my learning experience.  If I can get the small stuff that I want to work, to work, I will be happy indeed.  It will make a **big** impact on my current project.

Thanks again to Darrin and Acer for their helpful comments.

 

Marvin

A Matrix or a Vector is, as I understand it, an rtable.  Thus, suppose I pass each of several tasks a particular sub-matrix (non-overlapping) of the original matrix.  I would assume that the reading of the original matrix occurs at the time of the function call and not inside the routine.  Does this mean I am okay, that multiple threads can then run without blocking?

Thanks for the example.  I will give it some study.

Actually, I thought you were right about the function call overhead and passing the empty matrix to populate is what I did originally.  Actually, when I tested both versions, with the whole thing done inside the compiled function, and with the function call overhead there was no appreciable difference even for very large matrices.  The second version is easier to call and deal with, so I took the tiny hit.

 

Right now I would be interested in playing with compiling my simple piece of code outside of maple and then using define_external with 'THREAD_SAFE' to call it from Maple.  My problem is that the Visual C++ compiler and all of its switches is unfamiliar to me.  So what I would love is if you could post a small example of compiling code using the compiler and linker shipped with Maple (the C++ redistributable compiler) to create a small .dll and then the Maple to access it.

After all, what is the worst thing that can happen, if the small part of the run-time library that I am using is not thread safe then Maple will crash and I will know I can't use this trick.  If it works I will be one happy puppy ;^)

 

I posted a very simple example of what I am doing.  Basically the only piece that needs to be compiled looks like

 

foo:=proc(i::integer,j::integer,X::Matrix(datatype=float[8]),sig::float[8],ncols::integer)
local k::integer, temp::float[8];

temp:=add((X[i,k]-X[jmj])^2,k=1..ncols);
temp:=exp(-temp/2/sig;
return temp;

Pretty simple, but the sum over k can be big and this is going into

M:=Matrix(xdim,xdim,(i,j)->foo(i,j,X,sig,ncols),shape=symmetric,datatype=float[8]);

The Maple Matrix constructor runs fast so this is as good as doing the whole thing compiled.  The trouble is i,j run from 1 to 15000 and k from 1 to 3000, so the compiled part is important.

I am a theoretical physicist at the SLAC Linear Accelerator Laboratory in Menlo Park.  I have been using Maple for quite some time to do numerical computations in condensed matter physics that (effectively) involves finding eigenvectors of matrices that are 2^20 x 2^20 (clearly there are tricks here).  These are big calculations and heavily use the LinearAlgebra package and ArrayTools.  The point I want to make is that for a recent computation my student and I did the same calculation.  He used MatLab and I used Maple 12.  I made use of the Compile command in judiciously chosen sections of code, he used what MatLab had available.  We compared run times and in general my Maple code ran at the same rate or better than the MatLab code. 

Prior to using Maple for these computations I used C++.  The Maple code development is infinitely simpler and the performance is quite comparable to the compiled C++ code.  I have switched over to doing all of this in Maple and not looked back.  Of course, I really want to be able to speed up some new stuff using multi-threaded code, and Maple's current limitations hurt :>(

 

Marvin

I have really simple parallel stuff that I compile, but only at the lowest possible level.  All it does is an add() or mul() on local variables, or variables passed to it.  It doesn't modify anything and is really very thread safe.  Is there a way to compile this sort of thing without using Compiler:-Compile and linking it to Maple that would allow me to get the benefit of multi-threading without having Maple automatically force it to execute in a single thread?

I really didn't understand your private answer to me about this.  If possible I would love to see a complete example of how to do this.

Marvin

 

 

Could you please post an example of how you compiled and called the external CUDA API.  Every time I try to do that kind of thing I get all kinds of problems.

 

Marvin

Could you give us an idea of when the compiler will become thread safe.  This is, after all, very important to numerical programming.  I have several big projects that would really benefit from this.

Also, how does Maple multi-threading mesh with or conflict with the Maple Grid Toolbox?

How much experience do you guys have with CUDA in house.  I am really interested in getting one of the nVidia cards for big computations and I would love to be able to use Maple to access the environment.

 

 

 

 

1 2 Page 2 of 2