acer

2015...

Answered: acer 32722

March 22 2015

0 0

In Maple 2015.0 the Task multithreaded compiled code I gave immediately above runs about twice as fast as in Maple 18.02 on my 64bit Windows 7, for that 600x800 example.

add, Compile, and Task parallelization...

Answered: acer 32722

March 21 2015

0 0

@Carl Love You should check your results for correctness, when using `Compile` on a procedure containing calls to `add` over float values. It's a "known bug" that it often get confused about the types in play, in such a mix. As you originally have it the code intended to be compiled will not produce correct results, I suspect. The workaround of replacing `add` calls with explicit loops may slow down under evalhf. So one may need distinct versions for running compiled versus running under evalhf.

The following uses explicit loops instead of `add`, for the Compile version. The temporary accumulator for the sum is a scalar variable, since using `newaa[i,j]` instead as accumulator can (MS-Windows, LLVM compiler) incur a big performance hit due to unnecessary referencing into array memory.

Also, as mentioned before, I also obtain another performance gain by computing i-k-1+m outside of the n-loop (stored in `mindex`).

And lastly, by having the procedure `Convolve_C` accept the lower and upper index values for the i-loop as arguments, the Threads:-Task model can be applied to parallelize the computation.

>

restart:

>	kernelopts(version); kernelopts(numcpus);

Maple 18.02, X86 64 WINDOWS, Oct 20 2014, Build ID 991181
8

>

k := 3:

>	filelocation := cat(kernelopts(datadir),"/ship.jpg"): zimage := ImageTools:-Read(filelocation): zwidth := ImageTools:-Width(zimage): zHeight := ImageTools:-Height(zimage): #zimage := ImageTools:-Scale(zimage, 1..2zHeight, 1..2zwidth):

>	kernell := 2*k+1: kerneld := Matrix(kernell, kernell, fill=1/kernell^2, datatype=float[8], order=C_order):

>	st := time[real]():

>	new1zumage := ImageTools:-Convolution(zimage, kerneld):

>	Time[Convolution] := time[real]()-st;

Time[Convolution] := 0.112

>	aa := zimage(1..,1..,1): bb := zimage(1..,1..,2): cc := zimage(1..,1..,3):

>	newaa := copy(aa): newaa1 := copy(aa): newaa2 := copy(aa): newaa3 := copy(aa):

>	R := LinearAlgebra:-RowDimension(aa); C := LinearAlgebra:-ColumnDimension(aa);

R := 400
C := 600

>	st := time[real](): for i from k+1 to R-k-1 do for j from k+1 to C-k-1 do newaa[i,j] := add(x, x = aa(i-k..i+k, j-k..j+k) *~ kerneld); end do; end do; Time[NativeMaple] := time[real]()-st;

Time[NativeMaple] := 16.392

>	Convolve := proc(aa::Array(datatype=float[8],order=C_order),

>	newaa::Array(datatype=float[8],order=C_order),

>	kerneld::Array(datatype=float[8],order=C_order),

>

                 k::posint,R::posint,C::posint)
  local i, j, m, n, mindex,
        `k+1` := k+1, `C-k-1` := C-`k+1`, `R-k-1` := R-`k+1`,
        kernell := 2*k+1, `i-k-1`, `j-k-1`;
  for i from `k+1` to `R-k-1` do
    `i-k-1` := i-`k+1`;
    for j from `k+1` to `C-k-1` do
      `j-k-1` := j-`k+1`;
      newaa[i,j] := add(add(aa[`i-k-1`+m,`j-k-1`+n]*kerneld[m,n],
                            n=1..kernell),m=1..kernell);
    end do;
  end do;
end proc:

>	st := time[real](): evalhf(Convolve(aa, newaa1, kerneld, k, R, C)): Time[evalhf] := time[real]()-st; LinearAlgebra:-Norm( Matrix(newaa)-Matrix(newaa1) );

                            Time[evalhf] := 7.612
                                                -14
                          9.69779812010074238 10

>

ConvolveC := proc(aa::Array(datatype=float[8],order=C_order),
                  newaa::Array(datatype=float[8],order=C_order),
                  kerneld::Array(datatype=float[8],order=C_order),
                  k::posint,R::posint,C::posint,
                  rlo::posint,rhi::posint)
  option threadsafe;
  local i::integer[4], j::integer[4], m::integer[4], n::integer[4],
        mindex::integer[4], kp1::integer[4], Cmkp1::integer[4],
        kernell::integer[4], imkp1::integer[4], jmkp1::integer[4],
        temp1::float[8];
  kp1 := k+1;
  kernell := k+kp1;
  Cmkp1 := C-kp1;
  for i from rlo to rhi do
    imkp1 := i-kp1;
    for j from kp1 to Cmkp1 do
      jmkp1 := j-kp1;
      temp1 := 0.0;
      for m from 1 to kernell do
        mindex := imkp1+m;
        for n from 1 to kernell do
          temp1 := temp1 + aa[mindex,jmkp1+n] * kerneld[m,n];
        end do;
      end do;
      newaa[i,j] := temp1;
    end do;
  end do;
end proc:

>	Convolve_C := Compiler:-Compile(ConvolveC):

>	st := time[real](): Convolve_C(aa, newaa2, kerneld, k, R, C, k+1, R-(k+1)): Time[compiled] := time[real]()-st; LinearAlgebra:-Norm( Matrix(newaa)-Matrix(newaa2) );

Time[compiled] := 0.509

-14
8.53206394424432801 10

>

Convolve_Task := proc(aa, newaa, kerneld, k, R, C, rlo, rhi)
local rmid;
  if 11 < rhi - rlo then
    rmid := floor(1/2*rhi-1/2*rlo) + rlo;
    Threads:-Task:-Continue(':-null',
                            ':-Task'=[Convolve_Task,aa,newaa,kerneld,k,R,C,rlo,rmid],
                            ':-Task'=[Convolve_Task,aa,newaa,kerneld,k,R,C,rmid+1,rhi])
  else
    Convolve_C(aa,newaa,kerneld,k,R,C,rlo,rhi);
  end if;
end proc:

>	st := time[real](): Threads:-Task:-Start(Convolve_Task, aa, newaa3, kerneld, k, R, C, k+1, R-(k+1)); Time[Compiled_parallel] := time[real]()-st; LinearAlgebra:-Norm( Matrix(newaa)-Matrix(newaa3) );

                      Time[Compiled_parallel] := 0.232
                                                -14
                          8.53206394424432801 10

>	Times = Vector(sort(op(eval(Time)),(a,b)->rhs(a)>rhs(b)));

                             [ NativeMaple = 16.392   ]
                             [                         ]
                             [     evalhf = 7.612      ]
                             [                         ]
                     Times = [    compiled = 0.509     ]
                             [                         ]
                             [Compiled_parallel = 0.232]
                             [                         ]
                             [   Convolution = 0.112   ]

>

Download convolve_compiled3.mw

yes...

Answered: acer 32722

March 20 2015

0 0

@Carl Love Yes, that top-level code isn't the same as calling `Convolve` without evalhf. Rather, it is an elementwise (vectorized) use of kernel builtin `*` on float[8] rtables which it knows how to treat well. So, yes, it's already parallelized in another way, and is fast despite the fact that it repeatedly creates `subimage` as a collectible garbage rtable.

Attached is a sheet with a few ideas on speedup. This does not include what I'd expect to be best: Compiled and multithreaded with the Task mode.

The top level code can be optimized at its second stage, by rolling the two nested `add` calls into just one call.

Yes, I believe that using order=C_order on `subimage` and `kerneld` can make it a bit faster -- as long as loops or the nested `add` calls walk along rows at their innermost layer (so as to walk adjacent entries in memory).

More saving can be had by constucting part of the `subimage` indicies, for re-use, at the start of each nested i and j loop. (See `ipart`, `jpart`, and `mindex`.)

And then one can notice that `subimage` is not really needed in such a proc.

>

restart:

>	kernelopts(version);

Maple 18.02, X86 64 WINDOWS, Oct 20 2014, Build ID 991181

>

k := 3:

>	filelocation := cat(kernelopts(datadir),"/ship.jpg"):

>	zimage := ImageTools:-Read(filelocation):

>	zwidth := ImageTools:-Width(zimage):

>	zHeight := ImageTools:-Height(zimage): zimage := ImageTools:-Scale(zimage, 1..2zHeight, 1..2zwidth):

>	kernell := 2*k+1:

>	kerneld := Matrix(kernell, kernell, fill=1/kernell^2, datatype=float[8], order=C_order):

>	st:= time():

>	new1zumage:= ImageTools:-Convolution(zimage,kerneld):

>	time()-st;

0.203

>	#ImageTools:-View(new1zumage);

>	aa:= zimage(1..,1..,1):

>	bb:= zimage(1..,1..,2):

>	cc:= zimage(1..,1..,3):

>	subimage := Matrix(kernell, kernell, datatype=float[8], order=C_order):

>	newaa := copy(aa): newaa1 := copy(aa): newaa2 := copy(aa): newaa3 := copy(aa):

>	R := LinearAlgebra:-RowDimension(aa);

>	C := LinearAlgebra:-ColumnDimension(aa);

R := 800
C := 1200

>	st := time[real]():

>	for i from k+1 to R-k-1 do

>	for j from k+1 to C-k-1 do

>	subimage := aa(i-k..i+k, j-k..j+k) *~ kerneld; newaa[i,j] := add(s, s=subimage);

>

end do

>

end do;

>	T1s := time[real]()-st;

T1s := 40.814

>	st := time[real]():

>	for i from k+1 to R-k-1 do

>	for j from k+1 to C-k-1 do

>	subimage := aa(i-k..i+k,j-k..j+k) *~ kerneld;

>	newaa[i,j] := add(add(subimage[m,n], m=1..kernell), n=1..kernell)

>

end do

>

end do;

>	T1:= time[real]()-st;

T1 := 54.961

>	Convolve:= proc(aa, newaa, kerneld, subimage, k, R, C)

>	local i, j, m, n, `k+1` := k+1, `C-k-1` := C-`k+1`, kernell := 2*k+1;

>	for i from `k+1` to R-k-1 do

>	for j from `k+1` to `C-k-1` do

>	for m to kernell do

>	for n to kernell do

>	subimage[m,n] := aa[i-k+m-1, j-k+n-1] * kerneld[m,n]

>

end do

>

end do;

>	newaa[i,j] := add(add(subimage[m,n], m=1..kernell), n=1..kernell)

>

end do

>

end do

>

end proc:

>	st := time[real]():

>	evalhf(Convolve(aa, newaa1, kerneld, subimage, k, R, C)):

>	T2 := time[real]()-st;

T2 := 28.373

>	Convolve2:= proc(aa, newaa, kerneld, subimage, k, R, C)

>	local i, j, m, n, `k+1`:= k+1, `C-k-1` := C-`k+1`, kernell := 2*k+1, ipart, jpart, mindex;

>	for i from `k+1` to R-k-1 do ipart := i-k-1;

>	for j from `k+1` to `C-k-1` do jpart := j-k-1;

>	for m to kernell do mindex := ipart+m;

>	for n to kernell do

>	subimage[m,n] := aa[mindex, jpart+n] * kerneld[m,n]

>

end do

>

end do;

>	newaa[i,j] := add(add(subimage[m,n], n=1..kernell), m=1..kernell)

>

end do

>

end do

>

end proc:

>	st := time[real]():

>	evalhf(Convolve2(aa, newaa2, kerneld, subimage, k, R, C)):

>	T3 := time[real]()-st;

T3 := 24.766

>	Convolve3:= proc(aa, newaa, kerneld, k, R, C)

>	local i, j, m, n, `k+1`:= k+1, `C-k-1` := C-`k+1`, kernell := 2*k+1, ipart, jpart;

>	for i from `k+1` to R-k-1 do ipart := i-k-1;

>	for j from `k+1` to `C-k-1` do jpart := j-k-1;

>	newaa[i,j] := add(add(aa[ipart+m, jpart+n] * kerneld[m,n], n=1..kernell), m=1..kernell)

>

end do

>

end do

>

end proc:

>	st := time[real]():

>	evalhf(Convolve3(aa, newaa3, kerneld, k, R, C)):

>	T4 := time[real]()-st;

T4 := 19.051

>	LinearAlgebra:-Norm(Matrix(newaa)-Matrix(newaa1)); LinearAlgebra:-Norm(Matrix(newaa)-Matrix(newaa2)); LinearAlgebra:-Norm(Matrix(newaa)-Matrix(newaa3));

                                     0.
                                                -14
                          6.30606677987088915 10
                                                -14
                          6.30606677987088915 10

>

Download convolve_evalhf.mw

false...

Answered: acer 32722

March 20 2015

0 0

@rollermonkey Your claims about my suggestions appear to be entirely false, and appear to depend on your continued mistyping of key syntax, or misspelling (you now appear to have done with(Plots) not with(plots) as I suggested), or perhaps in part due to a faulty installation. Carl's Answer covers it in detail, I see.

subimage[m,n]...

Answered: acer 32722

March 20 2015

0 0

@Carl Love Did you intend,

  subimage[m,n]:= aa[i-k+m-1, j-k+n-1] * kerneld[m,n]

instead of that,

  subimage[m,n]:= aa[i-k+m, j-k+m] * kerneld[m,n]

within that innermost loop of `Convolve`?

various mistakes...

Answered: acer 32722

March 20 2015

0 0

You have several basic syntax mistakes which are not version specific.

To assign a plot to name F you need to use the syntax,

F := ...

and not,

F = ...

The plots package is loaded by issuing the command,

with(plots)

and not by (what you have),

withPlots

If you want to call display without loading the package then it would be done like either,

plots[display]( ... )

or,

plots:-display( ... )

and not (as you have it) as,

plots[display][ ... ]

acer

re-run...

Answered: acer 32722

March 15 2015

0 0

@Carl Love If one of the goals is to avoid potentially quadratic memory use (ie. O(k^2)) then it might be a good idea to explicitly unassign the names which get the indexed assignments, before the loop begins.

That would add safeguard against the user re-running the code without doing a restart, and incurring such a cost.

It could also help the user avoid a runtime error which trying to assign to a "long list", if re-run with restart, in the case that k>100.

It might also be more clear to use different names: say tableA for the indexed assignments and listA for the final conversion.

why ignore?...

Answered: acer 32722

March 15 2015

0 0

@kegj You have ignored my query: where are the statement terminators between the statements in the for-do loop?

More specifically, why is there no statement terminator at the end of this statement?

Y[j]:=Y[j-1]+dY

try it...

Answered: acer 32722

March 14 2015

0 0

@maple2015 Did you try print(NCP) after first doing your plotsetup call? Try it.

execution group...

Answered: acer 32722

March 12 2015

0 0

@nm Your screenshots each show a single execution group containing the first three commands.

add...

Answered: acer 32722

March 12 2015

0 0

@nm I was talking about `add` in particular, because you mentioned it specifically. I have a suspicion that, apart from a general note, the only things that usually get in such Compatibility sections are mention of new or changed options.

FWIW, there are details on changes to `cat` and `rand` on the ?updates,Maple2015,Language help page.

Of course, this too does not cover your Question in full. I don't know of any uniform way to find details on all changes.

separate execution groups...

Answered: acer 32722

March 11 2015

0 0

@nm Make sure that the three first commands are in separate execution groups or document blocks.

There is a long-standing problem in that the Std GUI implementation of the interface command may not take effect if it follows a restart in the same execution group.

You'll know it has worked because the printout will be in plaintext mode, not typeset (or italic).

.mapleinit...

Answered: acer 32722

March 11 2015

0 0

@Preben Alsholm My ~/.mapleinit initialization file contains code to read a .mapleinit file from currentdir(), if found. Sometimes I use this mechanism to customize libname, etc, according to directory.

Basically, I often have multiple projects on the go, and instead of having to change working directory and pass the -b option to the TTY interface I can just set up the details once in a local directory-specific initialization file.

...and I have such local initialization files print something to the terminal, to remind me whether or not they are active in the current session. I don't want that output pasted in my responses here.

Sure, I could just use the -s option to prevent the initialization file from running. But I also use the TTY interface history feature quite a bit, and I usually prefer not to quit the session since I can lose the connection between that project and that command history.

There are quite a few words that I habitually misspell. Fortunately, restart doesn't seem to be one of them.

read...

Answered: acer 32722

March 11 2015

0 0

@Markiyan Hirnyk I think that it's more automatic to not have to supply v>=3, which is why I wrote that Kitonum's answer is better. I realize that English may not be not your first language, but please try and pay closer attention.

not really...

Answered: acer 32722

March 11 2015

0 0

@Markiyan Hirnyk Your suggestion is not better, since it is not more automatic. It still depends on the user utilizing special knowledge of the structure; in this case the position of the value.

Kitonum's answer is more automatic, and it is better.

E-Mail Address:
Password:
Remember Me:	Automatically sign in on future visits

E-Mail Address:
Password:
Remember Me:	Automatically sign in on future visits

Ask a Question

Create a Post

32722 Reputation

29 Badges

Social Networks and Content at Maplesoft.com

MaplePrimes Activity

These are replies submitted by acer

2015...

add, Compile, and Task parallelization...

yes...

false...

subimage[m,n]...

various mistakes...

re-run...

why ignore?...

try it...

execution group...

add...

separate execution groups...

.mapleinit...

read...

not really...

Save this setting as your default sorting preference?

Ask a Question

Create a Post

Generating PDF…

Save this setting as your default sorting preference?
Note: You can change your preference any time in your account settings
Don't show this again