## 406 Reputation

14 years, 324 days

## codes without i+(j-1)*N...

if you can create a code without using the i+(j-1)N trick, please share it as I want to know if your version will be faster.

thanks

## Sorry no...

Thanks.

Using i+(j-1)*N is tailored for this model and is not useful for general problems. For this specific model one can write F and Jac explicity and that runs very well in parallel as well. See the code below. When j11 is passed with storage=sparse,datatype=float[8] there is no issue. As @Carl Love mentioned, the issue seems to be j11 being passed as a matrix/array with storage=sparse.

 > restart:
 > Digits:=15;

 >
 > N:=40;h:=1.0/N:

 > Fp:=subs("N"=N, proc(n0,nf,uu::Vector, uu_old::Vector, deltA::float, tnew::float, F1::Vector) local h::float,left::float,right::float,top::float,bot::float,i::integer,j::integer,ind::integer; h:=1.0/"N": for i from n0 to nf do for j from 1 to "N" do ind:=i+(j-1)*"N": if i = 1 then left:=0: else left:=(0.5*uu[ind]+uu_old[ind]-0.5*uu[ind-1]-uu_old[ind-1])/h:end: if i = "N" then right:=-0.1: else right:=(0.5*uu[ind+1]+uu_old[ind+1]-0.5*uu[ind]-uu_old[ind])/h:end: if j = 1 then bot:=0: else bot:=(0.5*uu[ind]+uu_old[ind]-0.5*uu[ind-"N"]-uu_old[ind-"N"])/h:end: if j = "N" then top:=-0.1: else top:=(0.5*uu[ind+"N"]+uu_old[ind+"N"]-0.5*uu[ind]-uu_old[ind])/h:end: F1[ind]:=uu[ind]-deltA*(right+top-left-bot)/h+deltA*(0.5*uu[ind]+uu_old[ind])^2; #print(F1); od:od:      end proc);

 >
 > F:=subs("N"=N, proc(uu::Vector, uu_old::Vector, deltA::float, tnew::float, F1::Vector) local h::float,left::float,right::float,top::float,bot::float,i::integer,j::integer,ind::integer; Fp(1,"N",uu,uu_old,deltA,tnew,F1);   end proc);

 >
 > FpDistribute := proc(n0,nf,uu::Vector, uu_old::Vector, deltA::float, tnew::float, F1::Vector) local nmid; if 20 < nf - n0 then #nmid := floor(1/2*nf - 1/2*n0) + i_low; nmid:=iquo(n0+nf,2); Continue(null, Task = [FpDistribute, n0,nmid,uu,uu_old,deltA,tnew,F1], Task = [FpDistribute,nmid,nf,uu,uu_old,deltA,tnew,F1]); else Fp(n0,nf,uu,uu_old,deltA,tnew,F1);  end if; end proc;

 > Fparallel:=subs("N"=N, proc(uu::Vector, uu_old::Vector, deltA::float, tnew::float, F1::Vector) local h::float,left::float,right::float,top::float,bot::float,i::integer,j::integer,ind::integer; Start(FpDistribute,1,"N",uu,uu_old,deltA,tnew,F1);   end proc);

 > interface(prettyprint=3);

 >
 > Jp:=subs("N"=N,proc(n0,nf,uu::Vector,uu_old::Vector,deltA::float,tnew::float,Nt::integer,J::Matrix(storage=sparse,datatype=float[8])) local h::float,left::float,right::float,top::float,bot::float,i::integer,j::integer,ind::integer; h:=1.0/"N"; #J:=Matrix(1..Nt,1..Nt,storage=sparse,datatype=float[8]): for i from n0 to nf do for j from 1 to "N" do #for j from 1 to "N" do ind:=i+(j-1)*"N": J[ind,ind]:=1.0+deltA*(0.5*uu[ind]+uu_old[ind]): if i > 1 then J[ind,ind]:=J[ind,ind]+deltA*0.5/h^2:J[ind,ind-1]:=-deltA*0.5/h^2:end: if i < "N" then J[ind,ind]:=J[ind,ind]+deltA*0.5/h^2:J[ind,ind+1]:=-deltA*0.5/h^2:end: if j >1 then J[ind,ind]:=J[ind,ind]+deltA*0.5/h^2:J[ind,ind-"N"]:=-deltA*0.5/h^2:end: if j < "N" then J[ind,ind]:= J[ind,ind]+deltA*0.5/h^2:J[ind,ind+"N"]:=-deltA*0.5/h^2:end: od:od: #J; end proc);

 > Jac:=subs("N"=N,proc(uu::Vector,uu_old::Vector,deltA::float,tnew::float,Nt::integer,J::Matrix(storage=sparse,datatype=float[8])) local h::float,left::float,right::float,top::float,bot::float,i::integer,j::integer,ind::integer; Jp(1,"N",uu,uu_old,deltA,tnew,Nt,J); #J; end proc);

 > JpDistribute := proc(n0,nf,uu::Vector, uu_old::Vector, deltA::float, tnew::float, Nt::integer,J::Matrix(storage=sparse,datatype=float[8])) local nmid; if 100 < nf - n0 then #nmid := floor(1/2*nf - 1/2*n0) + i_low; nmid:=iquo(n0+nf,2); Continue(null, Task = [JpDistribute, n0,nmid,uu,uu_old,deltA,tnew,Nt,J], Task = [JpDistribute,nmid,nf,uu,uu_old,deltA,tnew,Nt,J]); else Jp(n0,nf,uu,uu_old,deltA,tnew,Nt,J);  end if; end proc;

 > Jacparallel:=subs("N"=N, proc(uu::Vector, uu_old::Vector, deltA::float, tnew::float, Nt::integer,J::Matrix(storage=sparse,datatype=float[8])) local h::float,left::float,right::float,top::float,bot::float,i::integer,j::integer,ind::integer; Start(JpDistribute,1,"N",uu,uu_old,deltA,tnew,Nt,J);   end proc);

 >

## From the article,...

"The Maple to C optimized code proved to be a winning combination. Stefan pioneered a similar hybrid development approach when he began writing a Modelica compiler for MapleSim in 2007. The initial implementation was written entirely in Maple. According to Stefan, the Maple language is much faster to write code in. It is much higher-level, allowing for rapid prototypes, without worrying about things like memory management and putting data-structures together. Two codebases are still maintained for the MapleSim Modelica compiler. All new features are developed in the Maple codebase, vetted and proven, before being translated to C. This approach has been adopted by Stefan's successors, and they also agree that in the end this actually saves time and produces a better result."

This has been my observation as well.  I hope to share some examples, in particular for solving DAEs and PDEs that demonstrate similar gains from Maple+C hybrid approach.

## DAE solver difference...

Note that Maple does not have a direct DAE solver, it converts the index 1 DAEs of the form
dy/dt = f(y,z) (1)

0 = g(y,z) (2)

to

dy/dt= f

dz/dt = ...  by differentiating (2) to find dz/dt.

Maplesim has direct DAE solvers that can solve (1) and (2).
Recently, we developed and published a direct sparse DAE solver in Maple.

## Resolved - at least partially...

Erik Postma @epostma had helped me in the past to call the MKL solver, so I am just guessing the libraries and trying it. It seems to work. But it may be buggy.

IntelMKL PARDISO can be superior to UMFPACK for very large sparse systems as shown earlier by our group for the paper and codes at https://sites.utexas.edu/maple/phase-field-models-for-li-metal-batteries/

 > restart:
 > Digits:=15;
 (1)
 > A:=Matrix(4,4,[[-1,2,0,0],[2,-1,2,0],[0,2,-1,2],[0,0,1,-1]],datatype=float[8],storage=sparse);
 (2)
 > b:=Vector(4,[1,0.,0,0],datatype=float[8]):
 >
 > sol:=LinearAlgebra:-LinearSolve(A,b,method=SparseDirectMKL);
 > s1:=LinearAlgebra:-LUDecomposition(A,method=SparseDirectMKL,output=handle);
 before P call after P call
 (3)
 > s1();
 (4)
 > pSolve := define_external("hw_PardisoSolve", ':-MAPLE', ':-LIB' = "linalg"):
 > x:=Vector(4,datatype=float[8]):
 > pSolve(b,x,s1);
 > x;
 (5)
 >
 > pFactor := define_external("hw_PardisoFactor", ':-MAPLE', ':-LIB' = "linalg"):
 > pCSF := define_external("hw_SpCompressedSparseForm", ':-MAPLE', ':-LIB' = "linalg"):
 >

One can also use pFactor, pCSF, pSolve to get C, R, X and then proceed.

 > CB,R,X:=pCSF(A,"r",1,1,false,true,false);
 (6)
 > mtype := 11; handle1 := pFactor(CB, R, X, mtype);
 before P call after P call
 (7)
 > x:=Vector(4,datatype=float[8]): pSolve(b,x,handle1): x;
 (8)

## well done...

Such a great thread, thanks all!

## Thanks...

@Carl Love
Not sure I know the difference. I am just looking for something that will call the external integrator without using evalf command.

See a MATLAB call from Maple and the small amount of RAM used (after the MATLAB opens up the first time). (This calls the vdp1 model). The expected wrapper or procedure should call the external integrator like this for a given n(number of odes), f(procedure to find the slopes), initial condition vector (y0), initial time (to) and final time (tf) with default settings for all other parameters

@AllanWittkopf is trying to see if he can help. I will update this page. My rough test for a sample code was able to run a model with 50KB RAM consumption for Maple's dsolve/numeric external integrator, this involved guessing the entries for the table creation/modification.

 >
 > restart:
 > Digits:=15;

 > currentdir();

 > with(Matlab);

 > ss:=proc(t0,tf,ic0,T,Y) T,Y := ode45("vdp1", 0 ..tf, ic0); end proc;

 > ic0:=[1,0];

 > CodeTools:-Usage((ss(0,1,ic0,T,Y)));

memory used=0.79MiB, alloc change=0 bytes, cpu time=0ns, real time=4.29s, gc time=0ns

 > T[57];Y[57];

 >

## Partial answer (not sure) and a potentia...

Please see the code attached. If anyone can explain the arguments passed then one can possibly write an evalhf-able procedure that will take n (number of odes), f(procedure for the slope), initial time, final time and do all of this with the external integrator and provide the results in evalhf format.
If anyone can show and store the misc and xf called by `dsolve/numeric/SC/IVPrun` this can be done. The goal is to reduce the number of non-evalhf able loops/calls.

(this code is written in Maple 2023).

 > restart:
 > Digits:=15:
 > eqs:=diff(ca(t),t)=-(u+u^2/2)*ca(t),diff(cb(t),t)=u*ca(t)-0.1*cb(t);

 > u:=0.1;

 > infolevel[all]:=10;

 > #soln:=dsolve({eqs,ca(0)=1.0,cb(0)=0.0},type=numeric):
 > #soln(1);
 > #showstat(`dsolve/numeric/SC/IVPrun`);
 > showstat(`dsolve/numeric/SC/hw_int`);

`dsolve/numeric/SC/hw_int` := proc()
local lib;
global `dsolve/numeric/SC/hw_int`;
1   try
2       lib := ExternalCalling:-ExternalLibraryName("ode2",'HWFloat');
3       `dsolve/numeric/SC/hw_int` := define_external('hw_integrate',':-
catch :
4       error
"'hw_integrate' in external library %1 could not be found/used",
lib
end try;
5   `dsolve/numeric/SC/hw_int`(_passed[1 .. _npassed])
end proc

 > `dsolve/numeric/SC/hw_int` := proc() local lib; global `dsolve/numeric/SC/hw_int`;       try           lib := ExternalCalling:-ExternalLibraryName("ode2",'HWFloat');           `dsolve/numeric/SC/hw_int` := define_external('hw_integrate',':-              MAPLE',(':-LIB') = lib,':-THREAD_SAFE')        catch :           error              "'hw_integrate' in external library %1 could not be found/used",              lib        end try; print(_passed);       `dsolve/numeric/SC/hw_int`(_passed[1 .. _npassed]); print("here"); print(_passed); print(_passed[11]); end proc;

 > soln:=dsolve({eqs,ca(0)=1.0,cb(0)=0.0},type=numeric,output=Array([seq(i/1.*1,i=0..1)])):

dsolve/numeric: entering dsolve/numeric

DEtools/convertsys: Naming is {ca(t) = Y[1] cb(t) = Y[2] diff(ca(t) t) = YP[1] diff(cb(t) t) = YP[2]}

DEtools/convertsys: converted to first-order system Y'(x) = f(x,Y(x))    namely (with Y' represented by YP)

DEtools/convertsys: correspondence between Y[i] names and original functions:

dsolve/numeric/DAE/rhsproc: Using previously generated rhs proc

dsolve/numeric: the procedure F(x,Y,YP) for computing Y'(x)=f(x,Y(x)) is: proc (N, X, Y, YP) option [Y[1] = ca(t), Y[2] = cb(t)]; YP[1] := -.105000000000000*Y[1]; YP[2] := .1*Y[1]-.1*Y[2]; 0 end proc

dsolve/numeric: initial conditions: x0=0., y0=[1.0, 0.]

dsolve/numeric/SC/preproc: no compilation, _Env_ivpnumeric_ToExternal not set

dsolve/numeric/SC/firststep: Checking ODE for hardware floating point computation

dsolve/numeric/SC/firststep: Initial point OK for hardware floating point computation

dsolve/numeric/SC/firststep: Initial step set to: .0504765875584155

dsolve/numeric/SC/IVPrun: First call, start from initial

dsolve/numeric/SC/IVPrun: Calling external hardware nonstiff integrator with evalhf= true

dsolve/numeric/SC/IVPrun: Completed external hardware nonstiff integrator

dsolve/numeric/SC/IVPrun: Time for integration= 0.

dsolve/numeric/SC/IVPrun: Time for integration= 0.

 >
 >

What we need to do is,  write a procedure that will take inputs as the number of odes, n, procedure f, initial conditions for Y vector, initial time, final time, and update the arguments and provide the final Y value from the arguments in evalhf form by direcly calling `dsolve/numeric/SC/hw_int`(_passed[1 .. _npassed]);

## bump up...

Trying to bump up this thread to get the attention of @acer, @tomleslie, and others.
Hopefully, this example convinces everyone that just choosing a different optimization package (global optimization, stochastic, genetic algorithm, etc) may not help in achieving the optimal results if the numerical discretization (in this case nvar for the number time horizons) is not providing sufficient precision. We would need nvar>=256 for this problem and approach.

If anyone has access to old Maple versions, calling the external dsolve/numeric (Without the print statements) and getting an output array as shown in the link below and example towards the end for the NAG subroutine should help.

https://www.maplesoft.com/support/helpJP/Maple/view.aspx?path=NAG/d02cjc

Though Maple does not support NAG toolbox anymore, if anyone has access to Maple 10 or earlier and if you are able to run the example on the NAG help page, please let me know.

Thanks

## GlobalSolve...

This is based on my limited testing of combining Maple's code with its global optimization toolbox.

Currently, you pay a separate license for the same.

Currently, this is offered by Nimbus which provides Genetic Algorithms.
A decade or so earlier, Maple provided the Global Optimization toolbox from Pintur. That had method=reducedgradient which was very robust and fast.
Hope this helps.

## Thanks and a general comment as a profes...

I like DirectSearch and I have nothing against it or genetic algorithms (GA) which are heavily used in the literature.

This discussion and analysis (from @acer and the solution from him) is the best and most efficient way to solve this model.

The next biggest step can be wrapping dsolve/numeric (with or without compile=true) and making the NLPSolve work with the same in evalhf form for computational efficiency.

The last 5-10 years have seen an exponential increase in open access/open software codes in Python, Julia, etc. Many times I have wondered why students/researchers used GA for well-defined models (with convexity). Now, I partially understand the reason. Many times wrappers are written for optimizers or different algorithms and then called from user-friendly software. But the wrapper is probably poorly written so that the gradient information is not properly passed with a sufficient number of options (tolerances in this case).

PS, note that only gradient-based methods have theoretical backing for locally optimal results for optimization and I am weary of using methods that don't have a good theoretical basis, though they might work very well (but they will be always slower for problems that can be solved with gradient based methods).

## Thanks...

I think any opimizer or nonlinearsolver should take the required tolerance as input and return the current-best objective, iterate.
@acer, I can confirm that this approach works with nonlinear constraints + dsolve/numeric/parameters + sqp. thanks again

## Thanks, this is the best answer...

There is no point in doing optimization to 1e-16 accuracy when we are trying to increase nvar(number of stages), solving ODEs/DAEs to 1e-6 accuracy.

Note that for some weird reason, for this problem stiff=true in the call to dsolve/numeric works without tweaking anything in the dsolve command or optimization command.

Knowing where it fails can help. For example, ideally NLPSolve should have given the answer that "objective was solved to 1e-6 precision, but not beyond that", instead of just returning "initial point satisfies first-order optimality condition".

fsolve has the same issue as well, and if I am not wrong, we can't force it to solve to a prescribed precision.

@acer, as again, thanks lot. I believe that you can fix almost any issues!

## Found a solution for now- still iffy...

Adding abstol=1e-12, reltol=1e-12 to dsolve/numeric calls solves the problem for this model. (Thanks to Allan Wittkopf)

He had also suggested using Digits:=18 inside ss (I am not a big fan of this)

With default tolerances use of, maxstep = 0.05 (assuming that your maximum time horizon is 0.1 with nvar = 10)seems to work as well.

fdiff or the procedures used by NLPSolve should be made more robust.

## Thanks...

Your code is not using dsolve numeric in parametric form. I will be posting the use of parametric form working with NLPSolve (thanks to Allan Wittkopf).

As I said, we don't want analytical solutions.

Also, one parameter optimization is different from multiple parameters (scalar vs vector).

To be clear, I am not looking for an answer for the model. My goal was to use NLPSolve with dsolve/numeric/parameters with sqp.

 1 2 3 4 5 6 7 Last Page 1 of 18
﻿