This post is about the relationship between the number of processors used in parallel processing with the Threads package and the resultant real times and cpu times for a computation.

In the worksheet below, I perform the same computation using each possible number of processors on my machine, one thru eight. The computation is adding a list of 32 million pre-selected random integers. The real times and cpu times are collected from each run, and these are analyzed with a variety of metrics that I devised. Note that garbage-collection (gc) time is not an issue in the timings; as you can see below, the gc times are zero throughout.

My conclusion is that there are severely diminishing returns as the number of processors increases. There is a major benefit in going from one processor to two; there is a not-as-great-but-still-substantial benefit in going from two processors to four. But the real-time reduction in going from four processors to eight is very small compared to the substantial increase in resource consumption.

Please discuss the relevance of my six metrics, the soundness of my test technique, and how the presentation could be better. If you have a computer capable of running more than eight threads, please modify and run my worksheet on it.

Diminishing Returns from Parallel Processing: Is it worth using more than four processors with Threads?

Author: Carl J Love, 2016-July-30 

Set up tests

 

restart:

currentdir(kernelopts(homedir)):
if kernelopts(numcpus) <> 8 then
     error "This worksheet needs to be adjusted for your number of CPUs."
end if:
try fremove("ThreadsData.m") catch: end try:
try fremove("ThreadsTestData.m") catch: end try:
try fremove("ThreadsTest.mpl") catch: end try:

#Create and save random test data
L:= RandomTools:-Generate(list(integer, 2^25)):
save L, "ThreadsTestData.m":

#Create code file to be read for the tests.
fd:= FileTools:-Text:-Open("ThreadsTest.mpl", create):
fprintf(
     fd,
     "gc();\n"
     "read \"ThreadsTestData.m\":\n"
     "CodeTools:-Usage(Threads:-Add(x, x= L)):\n"
     "fd:= FileTools:-Text:-Open(\"ThreadsData.m\", create, append):\n"
     "fprintf(\n"
     "     fd, \"%%m%%m%%m\\n\",\n"
     "     kernelopts(numcpus),\n"
     "     CodeTools:-Usage(\n"
     "          Threads:-Add(x, x= L),\n"
     "          iterations= 8,\n"
     "          output= [realtime, cputime]\n"
     "     )\n"
     "):\n"
     "fclose(fd):"
):
fclose(fd):

#Code review
fd:= FileTools:-Text:-Open("ThreadsTest.mpl"):
while not feof(fd) do
     printf("%s\n", FileTools:-Text:-ReadLine(fd))
end do:

fclose(fd):

gc();
read "ThreadsTestData.m":
CodeTools:-Usage(Threads:-Add(x, x= L)):
fd:= FileTools:-Text:-Open("ThreadsData.m", create, append):
fprintf(
     fd, "%m%m%m\n",
     kernelopts(numcpus),
     CodeTools:-Usage(
          Threads:-Add(x, x= L),
          iterations= 8,
          output= [realtime, cputime]
     )
):
fclose(fd):

 

Run the tests

restart:

kernelopts(numcpus= 1):
currentdir(kernelopts(homedir)):
read "ThreadsTest.mpl":

memory used=0.79MiB, alloc change=0 bytes, cpu time=2.66s, real time=2.66s, gc time=0ns

memory used=0.78MiB, alloc change=0 bytes, cpu time=2.26s, real time=2.26s, gc time=0ns

 

Repeat above test using numcpus= 2..8.

 

restart:

kernelopts(numcpus= 2):
currentdir(kernelopts(homedir)):
read "ThreadsTest.mpl":

memory used=0.79MiB, alloc change=2.19MiB, cpu time=2.73s, real time=1.65s, gc time=0ns

memory used=0.78MiB, alloc change=0 bytes, cpu time=2.37s, real time=1.28s, gc time=0ns

 

restart:

kernelopts(numcpus= 3):
currentdir(kernelopts(homedir)):
read "ThreadsTest.mpl":

memory used=0.79MiB, alloc change=4.38MiB, cpu time=2.98s, real time=1.38s, gc time=0ns

memory used=0.78MiB, alloc change=0 bytes, cpu time=2.75s, real time=1.05s, gc time=0ns

 

restart:

kernelopts(numcpus= 4):
currentdir(kernelopts(homedir)):
read "ThreadsTest.mpl":

memory used=0.80MiB, alloc change=6.56MiB, cpu time=3.76s, real time=1.38s, gc time=0ns

memory used=0.78MiB, alloc change=0 bytes, cpu time=3.26s, real time=959.75ms, gc time=0ns

 

restart:

kernelopts(numcpus= 5):
currentdir(kernelopts(homedir)):
read "ThreadsTest.mpl":

memory used=0.80MiB, alloc change=8.75MiB, cpu time=4.12s, real time=1.30s, gc time=0ns

memory used=0.78MiB, alloc change=0 bytes, cpu time=3.74s, real time=910.88ms, gc time=0ns

 

restart:

kernelopts(numcpus= 6):
currentdir(kernelopts(homedir)):
read "ThreadsTest.mpl":

memory used=0.81MiB, alloc change=10.94MiB, cpu time=4.59s, real time=1.26s, gc time=0ns

memory used=0.78MiB, alloc change=0 bytes, cpu time=4.29s, real time=894.00ms, gc time=0ns

 

restart:

kernelopts(numcpus= 7):
currentdir(kernelopts(homedir)):
read "ThreadsTest.mpl":

memory used=0.81MiB, alloc change=13.12MiB, cpu time=5.08s, real time=1.26s, gc time=0ns

memory used=0.78MiB, alloc change=0 bytes, cpu time=4.63s, real time=879.00ms, gc time=0ns

 

restart:

kernelopts(numcpus= 8):
currentdir(kernelopts(homedir)):
read "ThreadsTest.mpl":

memory used=0.82MiB, alloc change=15.31MiB, cpu time=5.08s, real time=1.25s, gc time=0ns

memory used=0.78MiB, alloc change=0 bytes, cpu time=4.69s, real time=845.75ms, gc time=0ns

 

Analyze the data

restart:

currentdir(kernelopts(homedir)):

(R,C):= 'Vector(kernelopts(numcpus))' $ 2:
N:= Vector(kernelopts(numcpus), i-> i):

fd:= FileTools:-Text:-Open("ThreadsData.m"):
while not feof(fd) do
     (n,Tr,Tc):= fscanf(fd, "%m%m%m\n")[];
     (R[n],C[n]):= (Tr,Tc)
end do:

fclose(fd):

plot(
     (V-> <N | 100*~V>)~([R /~ max(R), C /~ max(C)]),
     title= "Raw timing data (normalized)",
     legend= ["real", "CPU"],
     labels= [`number of processors\n`, `%  of  max`],
     labeldirections= [HORIZONTAL,VERTICAL],
     view= [DEFAULT, 0..100]
);

The metrics:

 

R[1] /~ R /~ N:          Gain: The gain from parallelism expressed as a percentage of the theoretical maximum gain given the number of processors

C /~ R /~ N:               Evenness: How evenly the task is distributed among the processors

1 -~ C[1] /~ C:           Overhead: The percentage of extra resource consumption due to parallelism

R /~ R[1]:                   Reduction: The percentage reduction in real time

1 -~ R[2..] /~ R[..-2]:  Marginal Reduction: Percentage reduction in real time by using one more processor

C[2..] /~ C[..-2] -~ 1:  Marginal Consumption: Percentage increase in resource consumption by using one more processor

 

plot(
     [
          (V-> <N | 100*~V>)~([
               R[1]/~R/~N,             #gain from parallelism
               C/~R/~N,                #how evenly distributed
               1 -~ C[1]/~C,           #overhead
               R/~R[1]                 #reduction
          ])[],
          (V-> <N[2..] -~ .5 | 100*~V>)~([
               1 -~ R[2..]/~R[..-2],   #marginal reduction rate
               C[2..]/~C[..-2] -~ 1    #marginal consumption rate        
          ])[]
     ],
     legend= typeset~([
          'r[1]/r/n',
          'c/r/n',
          '1 - c[1]/c',
          'r/r[1]',
          '1 - `Delta__%`(r)',
          '`Delta__%`(c) - 1'       
     ]),
     linestyle= ["solid"$4, "dash"$2], thickness= 2,
     title= "Efficiency metrics\n", titlefont= [HELVETICA,BOLD,16],
     labels= [`number of processors\n`, `% change`], labelfont= [TIMES,ITALIC,14],
     labeldirections= [HORIZONTAL,VERTICAL],
     caption= "\nr = real time,  c = CPU time,  n = # of processors",
     size= combinat:-fibonacci~([16,15]),
     gridlines
);

 

 

Download Threads_dim_ret.mw


Please Wait...