acer

32363 Reputation

29 Badges

19 years, 332 days
Ontario, Canada

Social Networks and Content at Maplesoft.com

MaplePrimes Activity


These are replies submitted by acer

@kegj If you are using 2D Math input mode (the default for new users) then print (A) with a space before the bracket will get interpreted as the implicit multiplication print*A rather than a call to the print command.

@Preben Alsholm I have submitted a bug report.

What you've shown in your followup Matlab figure is sometimes referred to as a colorbar.

I have some code for Maple 2015 which automates construction and placement of such a thing, from a given 3D plot with HUE/HSV coloring.

But I wish to make some adjustments to make it more robust (to handle both GRID and MESH structures for 3D plots) and customizable (orientation, position, tickmarks), and am also still fiddling with the automated sizing. I hope to be able to branch off a post for it towards the middle of next week.

acer

 

With some triple bypass surgery the procedure InlinedNewton below gets the bodies of the procedures prccons, prcconsJ, and s3 inlined into it. And it is then singly Compiled.

 

The singly Compiled InlinedNewtonc runs somewhat faster than did the previous Compiled `Newtonc` (which called compiled versions of those three other procedures) on a single iteration loop from a single initial point, with tolerance target 2e-6 and problem size N=50. And `Newtonc` was faster than uncompiled `Newton` even when that too called the three compiled worker procedures.

 

It's appropriate now to find examples which need multiple initial points in order to converge.

 

There are also the "extra parameters" yet to introduce, which must be done somewhat carefully as some of the ToInert/FromInert inlining is hand-edited.


restart:

Digits := 15:

N := 50;

50

f := array([seq(evalf(-2*x[i]+x[i]^2+0.99/i^2),i=1..N)]):

CodeTools:-Usage(
  fsolve({seq(f[i],i=1..N)},{seq(x[i]=0.5,i=1..N)})  ):
type(eval(%,1),set(name=numeric));

memory used=62.57MiB, alloc change=96.00MiB, cpu time=1.98s, real time=1.96s, gc time=62.40ms

true

# s3 is not recreated for each new problem, so does not contribute
# to problem specific timing. Even compilation to s3c would be done
# at most once, upon package load say.
#
s3 := proc(n::posint,
           A::Array( datatype = float[8] ),
           ip::Array( datatype = integer[4] ),
           b::Array( datatype = float[8] ) )
   local i::integer, j::integer, k::integer, m::integer, t::float;
      ip[n] := 1;
      for k to n-1 do
        m := k;
        for i from k+1 to n do
          if abs(A[m,k]) < abs(A[i,k]) then
            m := i
           end if
         end do;
        ip[k] := m;
        if m <> k then
          ip[n] := -ip[n]
         end if;
       t := A[m,k];
       A[m,k] := A[k,k];
       A[k,k] := t;
       if t = 0 then
         ip[n] := 0;
         #return ip
         end if;
       for i from k+1 to n do
         A[i,k] := -A[i,k]/t
         end do;
       for j from k+1 to n do
         t := A[m,j];
         A[m,j] := A[k,j];
         A[k,j] := t;
         if t <> 0 then
           for i from k+1 to n do
             A[i,j] := A[i,j]+A[i,k]*t
             end do
           end if
         end do
       end do;
     if A[n,n] = 0 then
       ip[n] := 0
       end if;
     #ip[n]
      if ip[n] = 0 then
        error "The matrix A is numerically singular"
       end if;
      if 1 < n then
        for k to n-1 do
          m := ip[k];
          t := b[m];
          b[m] := b[k];
          b[k] := t;
          for i from k+1 to n do
           b[i] := b[i]+A[i,k]*t
           end do
         end do;
       for k from n by -1 to 2 do
         b[k] := b[k]/A[k,k];
         t := -b[k];
        for i to k-1 do
           b[i] := b[i]+A[i,k]*t
           end do
         end do
       end if;
     b[1] := b[1]/A[1,1];
     #return 0.0; # Better handling of explicit returns needed when inlining
end proc:

# Newtondummy is not recreated for each new problem. But its
# instantiation is done for each problem, and that is accounted
# for in the base timing below.
#
Newtondummy := proc(x::Array(datatype=float[8]),
               f0::Array(datatype=float[8]),
               j0::Array(datatype=float[8]),
               N::integer,
               ip::Array(datatype=integer[4]),
               maxiter::integer[4],
               tol::float[8])
   local ferr::float[8], temp::float[8],
         i::integer, k::integer, iter::integer;
      ferr := 10.0*tol;
      iter := 0;
      while ferr > tol and iter < maxiter do
        _DUMMY1; # f3(x, f0);
        _DUMMY2; # J3(x, j0);
        j0[1,1] := j0[1,1]; # else j0 not recognized as 2-dim?
        _DUMMY3; # s3c(N, j0, ip, f0);
        for i from 1 to N do
           x[i] := x[i] - f0[i];
        end do;
        ferr := abs(f0[1]);
        for i from 2 to N do
           temp := abs(f0[i]);
           if temp > ferr then
              ferr := temp;
           end if;
        end do;
        ferr := ferr/N;
        iter := iter + 1;
     end do:
     return ferr;
end proc:

# The following are all the problem specific assembly and compilation
# tasks, timed together. (They could be part of an appliable module,
# eventually).
#
(st,str) := time(),time[real]():

eqsA := [seq(F[i]=f[i],i=1..N)]:
irform2 := StatSeq(seq(Assign(op(i)),i=eqsA)):
prccons:= codegen[intrep2maple](Proc(Parameters(x::Array,F::Array),irform2)):
eqsJ := remove(type,[seq(seq(Jac[i,j]=evalf(diff(f[i],x[j])),j=1..N),i=1..N)],name=0.0):
irformJ := StatSeq(seq(Assign(op(i)),i=eqsJ)):
prcconsJ:= codegen[intrep2maple](Proc(Parameters(x::Array,Jac::Array),irformJ)):

(basetime,basetimer) := time()-st,time[real]()-str;

0.16e-1, 0.18e-1

# I need to document what I've done here.
# (I ought to consider writing a more general purpose "function-call Inliner".)
#
(st,str) := time(),time[real]():

UNewtondummy := ToInert(eval(Newtondummy)):
UNdlocals := [op([2,..],UNewtondummy)]:
Uprccons := ToInert(eval(prccons)):
UprcconsJ := ToInert(eval(prcconsJ)):
Us3 := ToInert(eval(s3)):
Us3locals := [op([2,..],Us3)]:
newlocals := [seq(`if`(t::'specfunc(anything,_Inert_DCOLON)',
                  _Inert_DCOLON(_Inert_NAME(cat(op([1,1],t),"_n1")),op([2..],t)),NULL),
                  t=Us3locals)]:
Ls3 := [seq(_Inert_LOCAL(i)=_Inert_LOCAL(i+nops(op(2,UNewtondummy))),i=1..nops(Us3locals))]:
T1 := subs(_Inert_NAME("_DUMMY1")=op([5,..],Uprccons), # param sequence 1st 2nd entries match
           _Inert_NAME("_DUMMY2")=op(subs([_Inert_PARAM(2)=_Inert_PARAM(3)],op([5],UprcconsJ))),
           _Inert_NAME("_DUMMY3")=op(subs([_Inert_PARAM(1)=_Inert_PARAM(4),
                                           _Inert_PARAM(2)=_Inert_PARAM(3),
                                           _Inert_PARAM(3)=_Inert_PARAM(5),
                                           _Inert_PARAM(4)=_Inert_PARAM(2),
                                           op(Ls3)],op([5],Us3))),
           subsop(2=_Inert_LOCALSEQ(op([2,..],UNewtondummy),op(newlocals)),UNewtondummy)):

# This is like `Newton`, but with the code from prccons,prcconsJ,s3 inlined.
InlinedNewton := FromInert(T1):

InlinedNewtonc := Compiler:-Compile(InlinedNewton):

(fusedcompiletime,fusedcompiletimer) := time()-st,time[real]()-str;

1.170, 1.159

# And now, essentially what was done previously. Ie, all of J3, f3, and s3c are
# Compiled. And so is `Newton`, except now it is accessing those three lexically
# while before they were explicitly declared global in `Newton`.
#
(st,str) := time(),time[real]():

f3 := Compiler:-Compile(prccons):
J3 := Compiler:-Compile(prcconsJ):
s3c := Compiler:-Compile(s3):
Newton := subs([_DUMMY1='f3(x, f0)',
                _DUMMY2='J3(x, j0)',
                _DUMMY3='s3c(N, j0, ip, f0)'],eval(Newtondummy)):
Newtonc := Compiler:-Compile(Newton):

(separatedcompiletime,separatedcompiletimer) := time()-st,time[real]()-str;

Warning, the function names {J3, f3, s3c} are not recognized in the target language

1.201, 1.157

# These need only be created for the first problem, and could be
# efficiently grown (but not replaced) for new problems with larger size N.
#
x0 := Array(1..N,datatype=float[8]):
f0 := Array(1..N,datatype=float[8]):
j0 := Array(1..N,1..N,datatype=float[8]):
ip := Array(1..N,datatype=integer[4]):

repeats := 1000:

ArrayTools:-Fill(0.0,f0):
ArrayTools:-Fill(1,ip):
(t11,t11r) := time(),time[real]():
for ii from 1 to repeats do
  ArrayTools:-Fill(0.5,x0):
  ArrayTools:-Fill(0.0,j0):
  InlinedNewtonc(x0,f0,j0,N,ip,100,HFloat(2e-6)); # singly compiled
end do:
(tottime,totrtime) := time()-t11,time[real]()-t11r:
evalf[4](tottime/repeats)*'seconds',evalf[4](totrtime/repeats)*'seconds[real]';

ArrayTools:-Fill(0.5,x0):
ArrayTools:-Fill(0.0,j0):
InlinedNewtonc(x0,f0,j0,N,ip,100,HFloat(2e-6)); # singly compiled

0.1092e-2*seconds, 0.1096e-2*seconds[real]

0.927128008641316376e-8

ArrayTools:-Fill(0.0,f0):
ArrayTools:-Fill(1,ip):
(t11,t11r) := time(),time[real]():
for ii from 1 to repeats do
  ArrayTools:-Fill(0.5,x0):
  ArrayTools:-Fill(0.0,j0):
  Newtonc(x0,f0,j0,N,ip,100,HFloat(2e-6)); # separetly compiled
end do:
(tottime,totrtime) := time()-t11,time[real]()-t11r:
evalf[4](tottime/repeats)*'seconds',evalf[4](totrtime/repeats)*'seconds[real]';

ArrayTools:-Fill(0.5,x0):
ArrayTools:-Fill(0.0,j0):
Newtonc(x0,f0,j0,N,ip,100,HFloat(2e-6)); # separately compiled

0.1529e-2*seconds, 0.1538e-2*seconds[real]

0.927128008641316376e-8

 


Download newtoncompiled5.mw

 

@Kitonum Thanks. It doesn't surprise me that I'd miss the simpler rotation!

That's very nice.

I thought it might be simplified a little: allowing negative H1 and H2, similar construction for pieces A and C, using one less piece, and slightly simpler bounds.

(Sorry if I make a mistake with coding it...)

 

restart;

Pic := proc (R::positive, H1::numeric, H2::numeric)

 

local A, B, C, E, F;

 

if R^2 <= H1^2+H2^2 then error "Should be H1^(2)+H2^(2)<R^(2)" end if;

 

A:=plot3d([R*sin(theta)*cos(phi), R*sin(theta)*sin(phi), R*cos(theta)],
           phi = -Pi .. Pi, theta = arccos(H2/R) .. Pi, color = green);
A:=plots:-display(plottools:-rotate(plottools:-transform((x,y,z)->[y,z,x])(A),0,0,Pi/2));

C := plot3d([R*sin(theta)*cos(phi), R*sin(theta)*sin(phi), R*cos(theta)],
             phi = -Pi .. Pi, theta = arccos(H1/R) .. Pi, color = green);

E := plot3d([r*cos(phi), r*sin(phi), H1], phi = -Pi .. Pi,
             r = 0 .. sqrt(R^2-H1^2), color = blue);

 

F := plot3d([H2, r*cos(phi), r*sin(phi)], phi = -Pi .. Pi,
             r = 0 .. sqrt(R^2-H2^2), color = gold);

 

plots[display](A,C,E,F, view = [-1.5 .. 1.5, -1.5 .. 1.5, -1.5 .. 1.5],
               axes = none, scaling = constrained, style=surface,
               lightmodel = light4, orientation = [60, 80, 0]);

 

end proc:

 

Pic(1,  0.50,  0.3);

Pic(1,  -0.50,  -0.3);

 

 

Download cutoutmodif.mw

 

And, a side view with negative H1 and H2,

plots:-display(Pic(1,  -0.70,  -0.30),view=[(-1..1)$3],axes=box,orientation=[180,90,90]);

acer

Yes, I expect that a single compiled procedure will be faster. One of my goals is to accomplish this robustly.

I'm just getting started with this topic, and will post more as I get along with it.

The attached worksheet has some revisions.

I noticed that the time to Compile prcconsJ (the Jacobian) was taking a long time for larger N (16sec for N=50 and 4min for N=100). So I wrapped the computation sequence eqsJ in a call to `remove` to get rid of the statements where the partials were identically 0.0. This improved the compilation time considerably for such a sparse example.

I regrouped the work a bit. Now the pieces which are problem dependent are timed together. These will be done once per new problem, and so it's helpful to gauge their cost. Other procs (Newton, s3, and their compiled versions) would be created as part of a module package and are also grouped together though they would not incur cost for each new problem. I intend to write a front-end to all this, with reusable/growable workspace Arrays as additional module members.

I have not yet introduced "extra parameters", but that is a central goal and likely a next step for me.

The cost for iterating from a single initial point (Vector) to a solution is very low. For 8 iterations the supplied sample problem converges to forward-error ferr<tol=1e-15 in about 8 iterations. I chose such a strict tolerance because that is roughly what fsolve is striving for when Digits=15. But I made the tolerance `tol` a parameter of procedure `Newton`, so it is customizable. I did the same for the maximum iteration limit.

If the cost of the Compiler for dense problems with larger size N becomes an issue then I might also look for alternatives in such cases such as `evalhf` or even another compiler which I suspect lurks in the dungeons of the dsolve/numeric palace.

There is lots to do, but I am on a root-finding kick and am interested in what is possible here. This is probably a suitable time to make some general remarks. This is a purely real scheme, and unlike fsolve cannot handle complex results. I understand that the dsolve/numeric motivation means that a dedicated real solver is desirable. If all this pans out then having a dedicated purely real as well as a complex solver of this type may be generally useful. By this I mean solvers dedicated to using hardware double precision. For making a general purpose solver there is also the matter of additional overhead to manage what should happen when the f[i] are not Compilable, or when higher working precision is needed.

I revised `Newton` to get rid of unnecessary loops (eg. x[i] can be updated in the "usual" way by subtracting f0[i], so no need to negate the j0[i,k] terms individually). The relative cost of running `Newton` vs `Newtonc` is not so significant, which is perhaps understandable since all of J3, f3, and s3c are Compiled. What is key is that they run fast, for a single iteration run from a given initial point. So there is leeway, for the coming case where multiple initial points might be required in order to get convergence to a root.

Some care wil be needed to allow multiple initial points to be tried, where they are generated efficiently; sample randomly from a box of bounds on the x[i], say. Statistics:-Sample allows for Vector container re-use, which might help. It's important to watch the performance costs. I'll try. It might even be possible to allow even more specialization for the case that multiple initial points can be attempted in parallel. (This is where my own interest lies.)

Also, it'll be necessary to allow for multiple sets of "extra parameter" values be supplied efficienctly. This should include the special case of also allowing the solution x[i] for the previous set of parameter values to be used as one special initial point for the next set of extra parameter values. It may turn out that the paradigm of dsolve/numeric, where a problem specific "solution procedure/module" is emitted by the user-facing entry-point, is useful here too. I mean that the final product might be a solution procedure factory: given a particular problem f[i] it might emit a runtime solving procedure that can be subsequently and repeaedtly supplied with arbitrary values for the extra parameters.

Here's the snapshot (run last in Maple 2015.0 on 64bit Linux).

newtoncompiled2.mw

It might be that providing us with a representative example of such problematic code would help in diagnosis of the problem.

Without seeing such code even more details might help.

For example, you write "script" but perhaps you are running the code in the GUI? Are these 3D plots? (There are known memory leaks in the Standard GUI when making it render a large number of 3D plots. By that I mean java memory leak, not mserver kernel leak, and `restart` does not help with it; it is only cleared by quitting the GUI entirely...)

acer

I had the loop to negate J for no good reason except that I hadn't bothered to optimise yet. Surely when comparing with fsolve the cost of codegen and the Compiler must be taken into account. (Its because of the wish to use it with a varying parameter set that I suspect it can still shine...)

I'll do more on this later/soonish.

ps. I should mention that I'm quite aware that the 2nd through 10000th iterations don't do the significant work, since they are using the solution found at the 1st attempt as the initial point. I was just trying to demonstrate that the compiled Newtonc works and is generally faster. True repeated timings would re-initialize to x0 to a non-solution point, naturally.

Now, if "extra parameters" to the system were also implemented then in practice one might wish to choose the previous solution as one possible initial point, if the parameters were only varied slightly. But that's another story.

Would I be right in thinking that more generally you would also want f3 and J3 to admit some additional parameters which could be passed as further arguments par and m?

These might consist of parameters of the DE system. They might be of type par::Vector(m,...) and m::integer.

So all of N, and f[i],i=1..N, and m, and par[j],j=1..m would be problem specific?

acer

Rather than tacking this idea on to an existing thread (you queried about it in a comment on another thread too, I believe) it might be more fruitful if you were to branch it off instead as an entirely new Question. I say that because it is a complicated and involved topic.

I'll try to list here a few suggestions for details that would help get somewhere with your question. I believe that these illustrate how complicated it can become in general.

Let's define terms, to avoid confusion. (Hopefully any new Question would do the same, or at least provide a very clear concrete motivational example.) Let's suppose you wish to find a root of all of the members of {F[1],F[2],..,F[n]} where each F[i] was a mathematical function of {y[1],[y[2],...,y[n]}.

Are you always starting with a (Vector-valued) procedure for F, whose procedure body contains the crucial formulae? Or would you sometimes begin with user-supplied multivariate expressions for all the F[i]?

Do you expect function evaluation of F to be by far the heaviest burden, compared to the cost of linear-solving (for a single Newton step)? If so then why is conjoined compilation so critical? Or do you expect the cost to be more even, where the performance difficulty is in performing a great many steps from a great many initial points, due say to high dimensionality? If you believe that the cost of dense linear solving is a bottleneck then are you sure that it is the cost of hardware double-precision lapack dgetrf/dgetrs or might it rather be the overhead of calling those from LinearSolve in the Maple Library?

What about the case where the F[i] are not compilable, nor even evalhf'able, or are susceptible to roundoff error at double precision? Do you want a mechanism that falls back to software float evaluation of the F[i] while retaining double-precision linear solving to Newton steps?

The issue of combining compilation of multiple procedures (or even of single procedures which are supposed to contain problem-dependent user-supplied formulae/expressions) is not simple. Sometimes multiple procedures can be compiled "together" is they are within a module. Or, I recall that worked at some date... Simple expressions can sometimes be substituted in for dummy placeholders in template procedures. (That's kind of what what goes on with module local Fractals:-EscapeTime:-setNewton). For simple F[i] one can sometimes get by using option inline although that too can require use of global names. There are plotting internal mechanisms which use FromInert/ToInert in order to try and shoehorn expressions into looping procedures.

...anyway, it's a very interesting topic. A concrete small example would be a decent place to start, in a new Question thread. (You could use the Branch button to create it from this Comment, perhaps.)

I am curious, why split the float[8] Matrix L0 into two columns, at all? Why not just pass L0 inself (instead of those first two arguments) as the first argument to plot?

acer

@bfathi Your followup code attempts to assign to the protected names sin and cos.

I don't understand why you can't make headway by yourself towards understanding the error messages. For example, you could try a help query of the keyword protected. You could also click through on the error message, which is a hyperlink to here.

I suggest that you just use some other names, like psin and pcos or whatever, instead of trying to assign to sin and cos. (The difficulties you've had with the very basics indicates that declaring local instances of those names would just lead you into greater difficulties.)

You also continue to use dsnumsort, which is incredibly poor, despite suggestions by myself and others.

I don't see that you're making significant effort on your own towards the purported university project. You don't even bother to describe the project to us in proper detail (...we're not all going to go buy that text book, you know).

First 334 335 336 337 338 339 340 Last Page 336 of 592