acer

32722 Reputation

29 Badges

20 years, 86 days
Ontario, Canada

Social Networks and Content at Maplesoft.com

MaplePrimes Activity


These are replies submitted by acer

In Maple 2015.0 the Task multithreaded compiled code I gave immediately above runs about twice as fast as in Maple 18.02 on my 64bit Windows 7, for that 600x800 example.

@Carl Love You should check your results for correctness, when using `Compile` on a procedure containing calls to `add` over float values. It's a "known bug" that it often get confused about the types in play, in such a mix. As you originally have it the code intended to be compiled will not produce correct results, I suspect. The workaround of replacing `add` calls with explicit loops may slow down under evalhf. So one may need distinct versions for running compiled versus running under evalhf.

The following uses explicit loops instead of `add`, for the Compile version. The temporary accumulator for the sum is a scalar variable, since using `newaa[i,j]` instead as accumulator can (MS-Windows, LLVM compiler) incur a big performance hit due to unnecessary referencing into array memory.

Also, as mentioned before, I also obtain another performance gain by computing i-k-1+m outside of the n-loop (stored in `mindex`).

And lastly, by having the procedure `Convolve_C` accept the lower and upper index values for the i-loop as arguments, the Threads:-Task model can be applied to parallelize the computation.

 

restart:

kernelopts(version);
kernelopts(numcpus);

          Maple 18.02, X86 64 WINDOWS, Oct 20 2014, Build ID 991181
                                      8

k := 3:

filelocation := cat(kernelopts(datadir),"/ship.jpg"):
zimage := ImageTools:-Read(filelocation):
zwidth := ImageTools:-Width(zimage):
zHeight := ImageTools:-Height(zimage):
#zimage := ImageTools:-Scale(zimage, 1..2*zHeight, 1..2*zwidth):

kernell := 2*k+1:
kerneld := Matrix(kernell, kernell, fill=1/kernell^2,
                  datatype=float[8], order=C_order):

st := time[real]():

new1zumage := ImageTools:-Convolution(zimage, kerneld):

Time[Convolution] := time[real]()-st;

                         Time[Convolution] := 0.112

aa := zimage(1..,1..,1):
bb := zimage(1..,1..,2):
cc := zimage(1..,1..,3):

newaa := copy(aa):
newaa1 := copy(aa):
newaa2 := copy(aa):
newaa3 := copy(aa):

R := LinearAlgebra:-RowDimension(aa);
C := LinearAlgebra:-ColumnDimension(aa);

                                  R := 400
                                  C := 600

st := time[real]():
for i from k+1 to R-k-1 do
     for j from k+1 to C-k-1 do
          newaa[i,j] := add(x, x = aa(i-k..i+k, j-k..j+k) *~ kerneld);
     end do;
end do;
Time[NativeMaple] := time[real]()-st;

                         Time[NativeMaple] := 16.392

Convolve := proc(aa::Array(datatype=float[8],order=C_order),

                 newaa::Array(datatype=float[8],order=C_order),

                 kerneld::Array(datatype=float[8],order=C_order),

                 k::posint,R::posint,C::posint)
  local i, j, m, n, mindex,
        `k+1` := k+1, `C-k-1` := C-`k+1`, `R-k-1` := R-`k+1`,
        kernell := 2*k+1, `i-k-1`, `j-k-1`;
  for i from `k+1` to `R-k-1` do
    `i-k-1` := i-`k+1`;
    for j from `k+1` to `C-k-1` do
      `j-k-1` := j-`k+1`;
      newaa[i,j] := add(add(aa[`i-k-1`+m,`j-k-1`+n]*kerneld[m,n],
                            n=1..kernell),m=1..kernell);
    end do;
  end do;
end proc:

st := time[real]():
evalhf(Convolve(aa, newaa1, kerneld, k, R, C)):
Time[evalhf] := time[real]()-st;
LinearAlgebra:-Norm( Matrix(newaa)-Matrix(newaa1) );

                            Time[evalhf] := 7.612
                                                -14
                          9.69779812010074238 10   

ConvolveC := proc(aa::Array(datatype=float[8],order=C_order),
                  newaa::Array(datatype=float[8],order=C_order),
                  kerneld::Array(datatype=float[8],order=C_order),
                  k::posint,R::posint,C::posint,
                  rlo::posint,rhi::posint)
  option threadsafe;
  local i::integer[4], j::integer[4], m::integer[4], n::integer[4],
        mindex::integer[4], kp1::integer[4], Cmkp1::integer[4],
        kernell::integer[4], imkp1::integer[4], jmkp1::integer[4],
        temp1::float[8];
  kp1 := k+1;
  kernell := k+kp1;
  Cmkp1 := C-kp1;
  for i from rlo to rhi do
    imkp1 := i-kp1;
    for j from kp1 to Cmkp1 do
      jmkp1 := j-kp1;
      temp1 := 0.0;
      for m from 1 to kernell do
        mindex := imkp1+m;
        for n from 1 to kernell do
          temp1 := temp1 + aa[mindex,jmkp1+n] * kerneld[m,n];
        end do;
      end do;
      newaa[i,j] := temp1;
    end do;
  end do;
end proc:

Convolve_C := Compiler:-Compile(ConvolveC):

st := time[real]():
Convolve_C(aa, newaa2, kerneld, k, R, C, k+1, R-(k+1)):
Time[compiled] := time[real]()-st;
LinearAlgebra:-Norm( Matrix(newaa)-Matrix(newaa2) );

                           Time[compiled] := 0.509

                                                -14
                          8.53206394424432801 10   

Convolve_Task := proc(aa, newaa, kerneld, k, R, C, rlo, rhi)
local rmid;
  if 11 < rhi - rlo then
    rmid := floor(1/2*rhi-1/2*rlo) + rlo;
    Threads:-Task:-Continue(':-null',
                            ':-Task'=[Convolve_Task,aa,newaa,kerneld,k,R,C,rlo,rmid],
                            ':-Task'=[Convolve_Task,aa,newaa,kerneld,k,R,C,rmid+1,rhi])
  else
    Convolve_C(aa,newaa,kerneld,k,R,C,rlo,rhi);
  end if;
end proc:

st := time[real]():
Threads:-Task:-Start(Convolve_Task, aa, newaa3, kerneld, k, R, C, k+1, R-(k+1));
Time[Compiled_parallel] := time[real]()-st;
LinearAlgebra:-Norm( Matrix(newaa)-Matrix(newaa3) );

                      Time[Compiled_parallel] := 0.232
                                                -14
                          8.53206394424432801 10   

Times = Vector(sort(op(eval(Time)),(a,b)->rhs(a)>rhs(b)));

                             [  NativeMaple = 16.392   ]
                             [                         ]
                             [     evalhf = 7.612      ]
                             [                         ]
                     Times = [    compiled = 0.509     ]
                             [                         ]
                             [Compiled_parallel = 0.232]
                             [                         ]
                             [   Convolution = 0.112   ]

 

 

Download convolve_compiled3.mw

@Carl Love Yes, that top-level code isn't the same as calling `Convolve` without evalhf. Rather, it is an elementwise (vectorized) use of kernel builtin `*` on float[8] rtables which it knows how to treat well. So, yes, it's already parallelized in another way, and is fast despite the fact that it repeatedly creates `subimage` as a collectible garbage rtable.

Attached is a sheet with a few ideas on speedup. This does not include what I'd expect to be best: Compiled and multithreaded with the Task mode.

The top level code can be optimized at its second stage, by rolling the two nested `add` calls into just one call.

Yes, I believe that using order=C_order on `subimage` and `kerneld` can make it a bit faster -- as long as loops or the nested `add` calls walk along rows at their innermost layer (so as to walk adjacent entries in memory).

More saving can be had by constucting part of the `subimage` indicies, for re-use, at the start of each nested i and j loop. (See `ipart`, `jpart`, and `mindex`.)

And then one can notice that `subimage` is not really needed in such a proc.

restart:

kernelopts(version);

          Maple 18.02, X86 64 WINDOWS, Oct 20 2014, Build ID 991181

k := 3:

filelocation := cat(kernelopts(datadir),"/ship.jpg"):

zimage := ImageTools:-Read(filelocation):

zwidth := ImageTools:-Width(zimage):

zHeight := ImageTools:-Height(zimage):
zimage := ImageTools:-Scale(zimage, 1..2*zHeight, 1..2*zwidth):

kernell := 2*k+1:

kerneld := Matrix(kernell, kernell, fill=1/kernell^2, datatype=float[8], order=C_order):

st:= time():

new1zumage:= ImageTools:-Convolution(zimage,kerneld):

time()-st;

                                    0.203

#ImageTools:-View(new1zumage);

aa:= zimage(1..,1..,1):

bb:= zimage(1..,1..,2):

cc:= zimage(1..,1..,3):

subimage := Matrix(kernell, kernell, datatype=float[8], order=C_order):

newaa := copy(aa):
newaa1 := copy(aa):
newaa2 := copy(aa):
newaa3 := copy(aa):

R := LinearAlgebra:-RowDimension(aa);

C := LinearAlgebra:-ColumnDimension(aa);

                                  R := 800
                                  C := 1200

st := time[real]():

for i from k+1 to R-k-1 do

     for j from k+1 to C-k-1 do

          subimage := aa(i-k..i+k, j-k..j+k) *~ kerneld;
          newaa[i,j] := add(s, s=subimage);

     end do

end do;

T1s := time[real]()-st;

                                T1s := 40.814

st := time[real]():

for i from k+1 to R-k-1 do

     for j from k+1 to C-k-1 do

          subimage := aa(i-k..i+k,j-k..j+k) *~ kerneld;

          newaa[i,j] := add(add(subimage[m,n], m=1..kernell), n=1..kernell)

     end do

end do;

T1:= time[real]()-st;

                                T1 := 54.961

Convolve:= proc(aa, newaa, kerneld, subimage, k, R, C)

local i, j, m, n, `k+1` := k+1, `C-k-1` := C-`k+1`, kernell := 2*k+1;     

     for i from `k+1` to R-k-1 do

          for j from `k+1` to `C-k-1` do

               for m to kernell do

                    for n to kernell do

                         subimage[m,n] := aa[i-k+m-1, j-k+n-1] * kerneld[m,n]

                    end do

               end do;

               newaa[i,j] := add(add(subimage[m,n], m=1..kernell), n=1..kernell)

          end do

     end do

end proc:

st := time[real]():

evalhf(Convolve(aa, newaa1, kerneld, subimage, k, R, C)):

T2 := time[real]()-st;

                                T2 := 28.373

Convolve2:= proc(aa, newaa, kerneld, subimage, k, R, C)

local i, j, m, n, `k+1`:= k+1, `C-k-1` := C-`k+1`, kernell := 2*k+1,
      ipart, jpart, mindex;     

     for i from `k+1` to R-k-1 do
          ipart := i-k-1;

          for j from `k+1` to `C-k-1` do
               jpart := j-k-1;

               for m to kernell do
                    mindex := ipart+m;

                    for n to kernell do

                         subimage[m,n] := aa[mindex, jpart+n] * kerneld[m,n]

                    end do

               end do;

               newaa[i,j] := add(add(subimage[m,n], n=1..kernell), m=1..kernell)

          end do

     end do

end proc:

st := time[real]():

evalhf(Convolve2(aa, newaa2, kerneld, subimage, k, R, C)):

T3 := time[real]()-st;

                                T3 := 24.766

Convolve3:= proc(aa, newaa, kerneld, k, R, C)

local i, j, m, n, `k+1`:= k+1, `C-k-1` := C-`k+1`, kernell := 2*k+1,
      ipart, jpart;     

      for i from `k+1` to R-k-1 do
          ipart := i-k-1;

          for j from `k+1` to `C-k-1` do
               jpart := j-k-1;

               newaa[i,j] := add(add(aa[ipart+m, jpart+n] * kerneld[m,n],
                                     n=1..kernell), m=1..kernell)

          end do

     end do

end proc:

st := time[real]():

evalhf(Convolve3(aa, newaa3, kerneld, k, R, C)):

T4 := time[real]()-st;

                                T4 := 19.051

LinearAlgebra:-Norm(Matrix(newaa)-Matrix(newaa1));
LinearAlgebra:-Norm(Matrix(newaa)-Matrix(newaa2));
LinearAlgebra:-Norm(Matrix(newaa)-Matrix(newaa3));

                                     0.
                                                -14
                          6.30606677987088915 10   
                                                -14
                          6.30606677987088915 10   

 


Download convolve_evalhf.mw

@rollermonkey Your claims about my suggestions appear to be entirely false, and appear to depend on your continued mistyping of key syntax, or misspelling (you now appear to have done with(Plots) not with(plots) as I suggested), or perhaps in part due to a faulty installation. Carl's Answer covers it in detail, I see.

@Carl Love Did you intend,

  subimage[m,n]:= aa[i-k+m-1, j-k+n-1] * kerneld[m,n]

instead of that,

  subimage[m,n]:= aa[i-k+m, j-k+m] * kerneld[m,n]

within that innermost loop of `Convolve`?

You have several basic syntax mistakes which are not version specific.

To assign a plot to name F you need to use the syntax,

F := ...

and not,

F = ...

The plots package is loaded by issuing the command,

with(plots)

and not by (what you have),

withPlots

If you want to call display without loading the package then it would be done like either,

plots[display]( ... )

or,

plots:-display( ... )

and not (as you have it) as,

plots[display][ ... ]

acer

@Carl Love If one of the goals is to avoid potentially quadratic memory use (ie. O(k^2)) then it might be a good idea to explicitly unassign the names which get the indexed assignments, before the loop begins.

That would add safeguard against the user re-running the code without doing a restart, and incurring such a cost.

It could also help the user avoid a runtime error which trying to assign to a "long list", if re-run with restart, in the case that k>100.

It might also be more clear to use different names: say tableA for the indexed assignments and listA for the final conversion.

@kegj You have ignored my query: where are the statement terminators between the statements in the for-do loop?

More specifically, why is there no statement terminator at the end of this statement?

Y[j]:=Y[j-1]+dY

 

@maple2015 Did you try print(NCP) after first doing your plotsetup call? Try it.

@nm Your screenshots each show a single execution group containing the first three commands.

@nm I was talking about `add` in particular, because you mentioned it specifically. I have a suspicion that, apart from a general note, the only things that usually get in such Compatibility sections are mention of new or changed options.

FWIW, there are details on changes to `cat` and `rand` on the ?updates,Maple2015,Language help page.

Of course, this too does not cover your Question in full. I don't know of any uniform way to find details on all changes.

 

@nm Make sure that the three first commands are in separate execution groups or document blocks.

There is a long-standing problem in that the Std GUI implementation of the interface command may not take effect if it follows a restart in the same execution group.

You'll know it has worked because the printout will be in plaintext mode, not typeset (or italic).

@Preben Alsholm My ~/.mapleinit initialization file contains code to read a .mapleinit file from currentdir(), if found. Sometimes I use this mechanism to customize libname, etc, according to directory.

Basically, I often have multiple projects on the go, and instead of having to change working directory and pass the -b option to the TTY interface I can just set up the details once in a local directory-specific initialization file.

...and I have such local initialization files print something to the terminal, to remind me whether or not they are active in the current session. I don't want that output pasted in my responses here.

Sure, I could just use the -s option to prevent the initialization file from running. But I also use the TTY interface history feature quite a bit, and I usually prefer not to quit the session since I can lose the connection between that project and that command history.

There are quite a few words that I habitually misspell. Fortunately, restart doesn't seem to be one of them.

@Markiyan Hirnyk I think that it's more automatic to not have to supply v>=3, which is why I wrote that Kitonum's answer is better. I realize that English may not be not your first language, but please try and pay closer attention.

@Markiyan Hirnyk Your suggestion is not better, since it is not more automatic. It still depends on the user utilizing special knowledge of the structure; in this case the position of the value.

Kitonum's answer is more automatic, and it is better.

First 345 346 347 348 349 350 351 Last Page 347 of 599