Items tagged with parallelization

Feed

I have a nested for loop that iterates through a range of values for x and y coordinates to create a 3d surface for illustration of my research. after the x loop there is a y loop, and inside of that y loop is a series of commands to find some eigenvalues of a matrix (which become the z coordinates) and sort them into already open files. This isn't bad when the precision i require is more than .02, but some of my matrices require up to 0.005 or less. The latter precision costs hours of computation time on just one processor. However my laptop has an i7, so I want to see if i can get the for loop to send its next iteration to the next processor in line while it has the previous ones still calculating. Have any tips?

I use maple 12 on my dual core laptop and i plan to buy maple 2016 and an 8 core system to get advandage of multiple processors. 

Does maple 2016 functions get advandage of multiprocessor systems or it will be the same as having one processor?

Hello

I have a procedure that builds an ideal from a specific set of polynomials and then calls the Groebner basis package to eliminate some of the variables.  Even though the procedure is running on a machine with 2 processors, 24 cores and 72 GB of ram, only one core has been used (and is always on a 100% usage).  Would the Grid Computing Toolbox be of some hope in this case?  If so,  how to insert the Grid commands so that Maple sends the calculations to the other cores (I find the document rather confusing)?   If I am talking non sense,  please let me know.

Many thanks

Ed

 

 

 

HI,

Can anyone suggest the tutorial or good examples for parallel computing in maple.

Thanks in advance.

Hi everyone,

For my first question, I am looking for some help about the following. I have the opportunity to run a worksheet in parallel on a cluster of sixteen workstations, each one endowed with twelve CPUs, through the GRID Computing Toolbox. However, I have troubles concerning how to do that.

I join the worksheet at issue: abmm_ma_1.mw The aim is to run large-scale numerical simulations of a dynamic system, depending on the values given to the initial conditions and to the parameters. The worksheet is organized in four execution groups:

  1. The required packages (combinat and LinearAlgebra).
  2. Calibration of the parameters and initial conditions.
  3. The system, which is embedded into a procedure called SIM.
  4. The activation of SIM, whose outputs are nine .mla files, each one being made of a real-number matrix.

The truth is, I do not clearly see how to modify the worksheet with some elements of the GRID package. Besides, the cluster operates under HTCondor so that running the worksheet requires beforehand the creation of a .sub file. This should be done in consistency with the aforesaid modification.

Any help is welcome, thanks a lot.

This post is about the relationship between the number of processors used in parallel processing with the Threads package and the resultant real times and cpu times for a computation.

In the worksheet below, I perform the same computation using each possible number of processors on my machine, one thru eight. The computation is adding a list of 32 million pre-selected random integers. The real times and cpu times are collected from each run, and these are analyzed with a variety of metrics that I devised. Note that garbage-collection (gc) time is not an issue in the timings; as you can see below, the gc times are zero throughout.

My conclusion is that there are severely diminishing returns as the number of processors increases. There is a major benefit in going from one processor to two; there is a not-as-great-but-still-substantial benefit in going from two processors to four. But the real-time reduction in going from four processors to eight is very small compared to the substantial increase in resource consumption.

Please discuss the relevance of my six metrics, the soundness of my test technique, and how the presentation could be better. If you have a computer capable of running more than eight threads, please modify and run my worksheet on it.

Diminishing Returns from Parallel Processing: Is it worth using more than four processors with Threads?

Author: Carl J Love, 2016-July-30 

Set up tests

 

restart:

currentdir(kernelopts(homedir)):
if kernelopts(numcpus) <> 8 then
     error "This worksheet needs to be adjusted for your number of CPUs."
end if:
try fremove("ThreadsData.m") catch: end try:
try fremove("ThreadsTestData.m") catch: end try:
try fremove("ThreadsTest.mpl") catch: end try:

#Create and save random test data
L:= RandomTools:-Generate(list(integer, 2^25)):
save L, "ThreadsTestData.m":

#Create code file to be read for the tests.
fd:= FileTools:-Text:-Open("ThreadsTest.mpl", create):
fprintf(
     fd,
     "gc();\n"
     "read \"ThreadsTestData.m\":\n"
     "CodeTools:-Usage(Threads:-Add(x, x= L)):\n"
     "fd:= FileTools:-Text:-Open(\"ThreadsData.m\", create, append):\n"
     "fprintf(\n"
     "     fd, \"%%m%%m%%m\\n\",\n"
     "     kernelopts(numcpus),\n"
     "     CodeTools:-Usage(\n"
     "          Threads:-Add(x, x= L),\n"
     "          iterations= 8,\n"
     "          output= [realtime, cputime]\n"
     "     )\n"
     "):\n"
     "fclose(fd):"
):
fclose(fd):

#Code review
fd:= FileTools:-Text:-Open("ThreadsTest.mpl"):
while not feof(fd) do
     printf("%s\n", FileTools:-Text:-ReadLine(fd))
end do:

fclose(fd):

gc();
read "ThreadsTestData.m":
CodeTools:-Usage(Threads:-Add(x, x= L)):
fd:= FileTools:-Text:-Open("ThreadsData.m", create, append):
fprintf(
     fd, "%m%m%m\n",
     kernelopts(numcpus),
     CodeTools:-Usage(
          Threads:-Add(x, x= L),
          iterations= 8,
          output= [realtime, cputime]
     )
):
fclose(fd):

 

Run the tests

restart:

kernelopts(numcpus= 1):
currentdir(kernelopts(homedir)):
read "ThreadsTest.mpl":

memory used=0.79MiB, alloc change=0 bytes, cpu time=2.66s, real time=2.66s, gc time=0ns

memory used=0.78MiB, alloc change=0 bytes, cpu time=2.26s, real time=2.26s, gc time=0ns

 

Repeat above test using numcpus= 2..8.

 

restart:

kernelopts(numcpus= 2):
currentdir(kernelopts(homedir)):
read "ThreadsTest.mpl":

memory used=0.79MiB, alloc change=2.19MiB, cpu time=2.73s, real time=1.65s, gc time=0ns

memory used=0.78MiB, alloc change=0 bytes, cpu time=2.37s, real time=1.28s, gc time=0ns

 

restart:

kernelopts(numcpus= 3):
currentdir(kernelopts(homedir)):
read "ThreadsTest.mpl":

memory used=0.79MiB, alloc change=4.38MiB, cpu time=2.98s, real time=1.38s, gc time=0ns

memory used=0.78MiB, alloc change=0 bytes, cpu time=2.75s, real time=1.05s, gc time=0ns

 

restart:

kernelopts(numcpus= 4):
currentdir(kernelopts(homedir)):
read "ThreadsTest.mpl":

memory used=0.80MiB, alloc change=6.56MiB, cpu time=3.76s, real time=1.38s, gc time=0ns

memory used=0.78MiB, alloc change=0 bytes, cpu time=3.26s, real time=959.75ms, gc time=0ns

 

restart:

kernelopts(numcpus= 5):
currentdir(kernelopts(homedir)):
read "ThreadsTest.mpl":

memory used=0.80MiB, alloc change=8.75MiB, cpu time=4.12s, real time=1.30s, gc time=0ns

memory used=0.78MiB, alloc change=0 bytes, cpu time=3.74s, real time=910.88ms, gc time=0ns

 

restart:

kernelopts(numcpus= 6):
currentdir(kernelopts(homedir)):
read "ThreadsTest.mpl":

memory used=0.81MiB, alloc change=10.94MiB, cpu time=4.59s, real time=1.26s, gc time=0ns

memory used=0.78MiB, alloc change=0 bytes, cpu time=4.29s, real time=894.00ms, gc time=0ns

 

restart:

kernelopts(numcpus= 7):
currentdir(kernelopts(homedir)):
read "ThreadsTest.mpl":

memory used=0.81MiB, alloc change=13.12MiB, cpu time=5.08s, real time=1.26s, gc time=0ns

memory used=0.78MiB, alloc change=0 bytes, cpu time=4.63s, real time=879.00ms, gc time=0ns

 

restart:

kernelopts(numcpus= 8):
currentdir(kernelopts(homedir)):
read "ThreadsTest.mpl":

memory used=0.82MiB, alloc change=15.31MiB, cpu time=5.08s, real time=1.25s, gc time=0ns

memory used=0.78MiB, alloc change=0 bytes, cpu time=4.69s, real time=845.75ms, gc time=0ns

 

Analyze the data

restart:

currentdir(kernelopts(homedir)):

(R,C):= 'Vector(kernelopts(numcpus))' $ 2:
N:= Vector(kernelopts(numcpus), i-> i):

fd:= FileTools:-Text:-Open("ThreadsData.m"):
while not feof(fd) do
     (n,Tr,Tc):= fscanf(fd, "%m%m%m\n")[];
     (R[n],C[n]):= (Tr,Tc)
end do:

fclose(fd):

plot(
     (V-> <N | 100*~V>)~([R /~ max(R), C /~ max(C)]),
     title= "Raw timing data (normalized)",
     legend= ["real", "CPU"],
     labels= [`number of processors\n`, `%  of  max`],
     labeldirections= [HORIZONTAL,VERTICAL],
     view= [DEFAULT, 0..100]
);

The metrics:

 

R[1] /~ R /~ N:          Gain: The gain from parallelism expressed as a percentage of the theoretical maximum gain given the number of processors

C /~ R /~ N:               Evenness: How evenly the task is distributed among the processors

1 -~ C[1] /~ C:           Overhead: The percentage of extra resource consumption due to parallelism

R /~ R[1]:                   Reduction: The percentage reduction in real time

1 -~ R[2..] /~ R[..-2]:  Marginal Reduction: Percentage reduction in real time by using one more processor

C[2..] /~ C[..-2] -~ 1:  Marginal Consumption: Percentage increase in resource consumption by using one more processor

 

plot(
     [
          (V-> <N | 100*~V>)~([
               R[1]/~R/~N,             #gain from parallelism
               C/~R/~N,                #how evenly distributed
               1 -~ C[1]/~C,           #overhead
               R/~R[1]                 #reduction
          ])[],
          (V-> <N[2..] -~ .5 | 100*~V>)~([
               1 -~ R[2..]/~R[..-2],   #marginal reduction rate
               C[2..]/~C[..-2] -~ 1    #marginal consumption rate        
          ])[]
     ],
     legend= typeset~([
          'r[1]/r/n',
          'c/r/n',
          '1 - c[1]/c',
          'r/r[1]',
          '1 - `Delta__%`(r)',
          '`Delta__%`(c) - 1'       
     ]),
     linestyle= ["solid"$4, "dash"$2], thickness= 2,
     title= "Efficiency metrics\n", titlefont= [HELVETICA,BOLD,16],
     labels= [`number of processors\n`, `% change`], labelfont= [TIMES,ITALIC,14],
     labeldirections= [HORIZONTAL,VERTICAL],
     caption= "\nr = real time,  c = CPU time,  n = # of processors",
     size= combinat:-fibonacci~([16,15]),
     gridlines
);

 

 

Download Threads_dim_ret.mw

Hello guys, i would like to do parallel computation in my code written in the Maple18. The question that can help me is:

Given a procedure that compute an function g, where g = f1+f2+f3+f4+f5+f6+f7+f8, i would like to compute all fi at same time.
Now, i´m using " grid:-seq('f[i]',[i=1,2,3,4,5,6,7,8])" and it works very well. However, i think that for my case an better solution should be;
Calculate the f1 in core 1, f2 in core 2, f3 in core 3 ... f8 in core 8 at same time, and after this, to sum all results(f1+f2+f3+..+f8). How i can do this?

Att,

Griffith.

Hi everyone.

I wrote simple program with several functions. I would like to obtain parallel computing of my functions. I can't understand how to use nodes.

My programm.

Simple_example.mw

Hi,

 

   I have a set of linear equations in terms of Ax+B=0, where A and B are matrices.

  I used linsolve or LinearSolve to solve the equations.

   Is there any simple way to run linsolve/LinearSolve parallelly? suppose I already have matrices A and B.

 

Thank you very much

  

 

I wanna solve a collection of linear systems Ax=b_i, i=1..n, with A an upper triangle matrix, parallelly with Maple. Is there anyone can give me some suggestions? Thanks lot.

I tried to use the parallelized Map function in Maple 16.

The following code worked as expected:

with(Threads);
Map(`^`,[1,2],2);

But then I tried:
Map(`op`,[[a,b]]);

and the Maple kernel crashed. This happens every time.

Any ideas what can be a reason of this?

Hi, i'm interested about parallel programming in maple but i cant find a good tutorial. is there anyone know a good refrence for this topic?

Hi,

I want to parallel the following code.it is good that part1 solved by one thread and part2 solved by other thread.

please help me.it must be noted that part1 and part 2 are independent "at each" iteration.

 

dtheta[m](x):= theta[m-1](x)+phi[m-1](x);
dphi[m](x):= theta[m-1](x)-2*phi[m-1](x);

for i from 2 by 1 to 10 do;

# part1

theta[i](x):=subs(m=i,dtheta[m](x)):
theta[i](x):=value(%):
eval(-theta[i](x),x=0);

hi, i want to write the following algorithm in maple with two threads, please help me. for m from 1 by 1 to 100 if thread1 then result1:= some equation if thread2 then result2:= some equation end if end do:

Page 1 of 1