## 70 Reputation

8 years, 143 days

## create linear system A*phi = s...

Ok, I think the way to solve this is to create somehow create a system of equation A*phi = s, and then use A^-1 to create a system of equation that only depend on s (and implicitly dependent on phi).
Not entirely sure how to proceed wit this knowledge though.

## @dharr Awesome, that worked! I wond...

@dharr Awesome, that worked! I wonder where the 2 factor comes from. I saw it in the matlab example too but in most mathematical pages on the subject the factor 2 isn't mentioned. Or maybe I missed it.

## @acer Is it possible to compile several ...

@acer Is it possible to compile several Maple procedures into one C code? for example

g := proc(test)
...
end:
f := proc(test)
...
end

gf := proc(test)

f(test);
g(test);

end
cgf := compile(gf)

## @acer Right of course, I just thought th...

@acer Right of course, I just thought there might be a general issue with Maple-to-C conversion that simply cannot be fixed. Obviously details matter when optimizing. The code is actually quite simple with loops/summation/multiplication of arrays. The specific code I'm working on is below. I have to call this code like a bazillion times.

```Product_3DC := proc(A :: Array(datatype=float[8]), B :: Array(datatype=float[8]), C :: Array(datatype=float[8]),
T :: integer, X :: integer, Y :: integer, XX :: integer, YY :: integer)
local i::integer, j::integer, p::integer;
local t::integer, x::integer, y::integer;
local sum :: float[8];

for t from 0 to T do
for x from 0 to X-XX do
for y from 0 to Y-YY do
sum := 0;
for p from 0 to t do
for j from 0 to x do
for i from 0 to y do
sum := sum + B[t-p+1,x-j+1,y-i+1]*A[p+1,j+1,i+1];
end do;
for i from 1 to Y-y do
sum := sum + B[t-p+1,x-j+1,y+i+1]*A[p+1,j+1,i+1] + B[t-p+1,x-j+1,i+1]*A[p+1,j+1,y+i+1];
end do;
end do;
for j from 1 to X-x do
for i from 0 to y do
sum := sum + B[t-p+1,x+j+1,y-i+1]*A[p+1,j+1,i+1] + B[t-p+1,j+1,y-i+1]*A[p+1,x+j+1,i+1];
end do;
for i from 1 to Y-y do
sum := sum + B[t-p+1,x+j+1,y+i+1]*A[p+1,j+1,i+1] + B[t-p+1,x+j+1,i+1]*A[p+1,j+1,y+i+1]
+ B[t-p+1,j+1,y+i+1]*A[p+1,x+j+1,i+1] + B[t-p+1,j+1,i+1]*A[p+1,x+j+1,y+i+1];
end do;
end do;
end do;

for p from 1 to T-t do
for j from 0 to x do
for i from 0 to y do
sum := sum + B[t+p+1,x-j+1,y-i+1]*A[p+1,j+1,i+1] + B[p+1,x-j+1,y-i+1]*A[t+p+1,j+1,i+1];
end do;
for i from 1 to Y-y do
sum := sum + B[t+p+1,x-j+1,y+i+1]*A[p+1,j+1,i+1] + B[t+p+1,x-j+1,i+1]*A[p+1,j+1,y+i+1]
+ B[p+1,x-j+1,y+i+1]*A[t+p+1,j+1,i+1] + B[p+1,x-j+1,i+1]*A[t+p+1,j+1,y+i+1];
end do;
end do;
for j from 1 to X-x do
for i from 0 to y do
sum := sum + B[t+p+1,x+j+1,y-i+1]*A[p+1,j+1,i+1] + B[t+p+1,j+1,y-i+1]*A[p+1,x+j+1,i+1]
+ B[p+1,x+j+1,y-i+1]*A[t+p+1,j+1,i+1] + B[p+1,j+1,y-i+1]*A[t+p+1,x+j+1,i+1];
end do;
for i from 1 to Y-y do
sum := sum + B[t+p+1,x+j+1,y+i+1]*A[p+1,j+1,i+1] + B[t+p+1,x+j+1,i+1]*A[p+1,j+1,y+i+1]
+ B[t+p+1,j+1,y+i+1]*A[p+1,x+j+1,i+1] + B[t+p+1,j+1,i+1]*A[p+1,x+j+1,y+i+1]
+ B[p+1,x+j+1,y+i+1]*A[t+p+1,j+1,i+1] + B[p+1,x+j+1,i+1]*A[t+p+1,j+1,y+i+1]
+ B[p+1,j+1,y+i+1]*A[t+p+1,x+j+1,i+1] + B[p+1,j+1,i+1]*A[t+p+1,x+j+1,y+i+1];
end do;
end do;
end do;
C[t+1,x+1,y+1] := (1/8)*sum;
end do;
end do;
end do;

end:
cProduct_3DC:=Compiler:-Compile(Product_3DC, optimize, inmem=false):
```

## @acer Thank you! That works, however it ...

@acer Thank you! That works, however it wasn't much of an upgrade in performance for me unfortunately. I didn't expect Maple to have the -02 flag already. I'm curious though why the difference in execution time is so different from the Maple-compiled code and my c-native code. I expected the performance to be better in c-native but not by too much, however they are not even close.

## @acer That would be great! Yea I...

@acer That would be great! Yea I'm using Maple 2019 and Ubuntu 19.10 (I use Windows 10 at work).

## good idea...

@acer Right, ok. I'm trying to write a procedure that multiplies arrays and does transpose etc. I would like to compile this code so it goes quickly.

## Yes I just reread it....

@Carl Love misunderstanding. I edited my Comment.

## Yes and yes...

@Carl Love exactly. So I need to receive the Intt arrays for example. The way I have done it now is turn the Intt rtable into a vector which I then Grid:-Run(node, task,...., assigto' =Intt[sx,sy]). But this is a problem because if I need to do this several times I get a left hand side error. So I have to unassign Intt and assign every iteration.

## Grid Run Return Arrays/rtables...

@Carl Love Thanks! I will try that. Maybe you can asnwer another question. My procedure creates several arrays that need to be combined into a long vector. The code that works is

```Threads:-Task:-Start( null, seq(seq(Task=[task,phi_EQ,X0,RMLt,q_IC[i,j],i,j],i=1..Nx),j=1..Ny) ):
```

The phi_EQ is the "long" final vector. The task procedure computes sections, or arrays, in parallel and enters it into phi_EQ in the right place. So the task procedure does not conflict with subdomains. Doen below is a sample code. This of course will not run but you see what I'm trying to do. The equations are arbitrary and the cDerivative function just inputs an array and outputs another. Same with the cIntegral. I have tried so many different syntax, or ways, to do this with Grid:-Run but I am struggling to assign the arrays to the final Vector.

```task:=proc(phix,x,RMLt,q_IC,sx,sy)
global kappa,tM,xM,yM,NU,Neqs;
local m,p,n,eq1,eq2Intt,v,neq,ind,
local ux,vx:

ind := (1..tM+1, 1..xM+1, 1..yM+1);
for neq from 1 to Neqs do
v[neq] := Array(ind, (m,p,n)-> x[((sx-1)*Nx+(sy-1))*Neqs*NU+(neq-1)*NU+(m-1)*(xM+1)*(yM+1)+(p-1)*(yM+1)+(n-1)+1], datatype=float[8]);
od:

ux  := Array(ind, datatype=float[8]); cDerivative_3XC(v[1],ux,BMAx[sx],tM,xM,yM):
vx  := Array(ind, datatype=float[8]); cDerivative_3XC(v[2],vx,BMAx[sx],tM,xM,yM):

eq1[ind] := -ux[ind]-vx[ind]:
eq2[ind] := -2.0*ux[ind]+vx[ind]:

Intt[1] := Array(ind, datatype=float[8]); cIntegral_tC(eq1,Intt[1],RMLt,tM,xM,yM):
Intt[2] := Array(ind, datatype=float[8]); cIntegral_tC(eq2,Intt[2],RMLt,tM,xM,yM):

for neq from 1 to Neqs do
for m   from 0 to tM   do
for p   from 0 to xM   do
for n   from 0 to yM   do
if m=0 and p <= xM-2 and n <= yM-2 then
phix[((sx-1)*Nx+(sy-1))*Neqs*NU+(neq-1)*NU+m*(xM+1)*(yM+1)+p*(yM+1)+n+1]:=
Intt[neq][1,p+1,n+1]+2*q_IC[neq,0,p,n]:

elif p > xM-2 or n > yM-BCy2 then
phix[((sx-1)*Nx+(sy-1))*Neqs*NU+(neq-1)*NU+m*(xM+1)*(yM+1)+p*(yM+1)+n+1]:= 0:

else
phix[((sx-1)*Nx+(sy-1))*Neqs*NU+(neq-1)*NU+m*(xM+1)*(yM+1)+p*(yM+1)+n+1]:=
Intt[neq][m+1,p+1,n+1]:
fi:

od: od: od: od:

end:
```

## @Carl Love Yes, that works. I'm...

Yes, that works. I'm using Maple 2018. Any reason for the compound line other than aesthetic?

I must have made a mistake before because I ran the different alternatives again and you are correct, Alt1 and Alt3 are the same. To be honest I have no idea why it was showing different result before. It was a simple copy paste and print result.

## @kfli This seems to be correct also...

@kfli
This seems to be correct also

R := norm(df,2);
Q := df/~R;
gamma := Transpose(Q).fval/~R;

## @Joe Riel That's really great!...

@Joe Riel

That's really great! I will be using that in all my codes from now on :D I was really missing the Matlab profiling function in Maple.

## Great stuff!...

@Carl Love

Hah that's why I need people like you. I was ripping my hair out over that and it wasn't even the issue.

Well the code does 'work' in the sense that it gives the right solution to the set of equations. There was only that weird issue where the produre just stopped for no apparant reason. No errors or anything, and I thought it had to do with the procedure issue but I was mistaken. I changed a few things as you see. I changed QR deomposition to 'NAG' output and I check the condition number of Q instead of R (because R is now a vector). I am not sure if that is the same thing mathematically so it could be working slightly different than original, but it atleast finishes the procedure.

I might have been confused on the line of the code you comment. I was trying to implement MTM[ldivide]. I tried all three alternatives.

Alt.1
R := norm(df,2);
Q := R/~df;
gamma := R/~Transpose(Q).fval;

Alt. 2
R := norm(df,2);
Q := MTM[ldivide](R,df);
gamma := MTM[ldivide](R,Transpose(Q).fval);

Alt. 3
gamma := Transpose(df).fval;

All give different results (ldivide is corrent). I thought that Alt.1 and Alt.2 were equivalent but I was wrong. Alt.2 is the from the original Matlab code. I thought lvdivide was the same as '/~'. The idea is to solve df.gamma=fval with QR decomp.

df.gamma=fval
QR.gamma=fval
gamma=R^-1.Q' . fval

Side note: If we can make a procedure that alters prexisting Q R instead of making QR from scratch everytime we change df than this code would be working smooth as eggs.

## I would have never guessed.....

@Carl Love
That's crazy. Thanks for the info. Where did you read this? As of right now I have changed some things so that the code works for all types of equations. It looks like this if you are interested.

restart:
Digits:=12:
Runtime:=time():
with(LinearAlgebra):
with(plots):
with(orthopoly):
with(ArrayTools):

N := 2:
phix := Vector(N):
X    := Vector(N):
phix[1] := cos(x[2]):
phix[2] := 3*cos(x[1]):
X[1] := 1.0:
X[2] := 2.0:

#Code AndersonAcceleration.
AndersonAcceleration:=proc(N,phi,X)
global x, xS, here1, here2;
local mMax, itmax, atol, rtol, droptol, beta, AAstart, res_hist, df, DGg, gamma, X0;
local DG, mAA, iter, gval,fval,res_norm, condDF, tol, f_old, g_old, y, i, k, Q, R, QRg, dfT, DGT;

(*
%------------------------------------------------------------------------
% Function xS = AndersonAcceleration(N,phi,x0).
%
% Fixed-point iteration with or without Anderson acceleration.
% 'phi' is the fixed point iteration map and 'xS' is the
% solution, so that xS = phi(xS).
%
% Input arguments:
%    X0 = Initial value solution. Note: In this function this variable
%        must be a column vector.
%       1. 'mMax' = maximum number of stored residuals (non-negative integer).
%       NOTE: 'mMax' = 0 => no acceleration. default=1000
%       2. 'itmax' = maximum allowable number of iterations. default=1000
%       3. 'atol' = absolute error tolerance. default=1.0e-6
%       4. 'rtol' = relative error tolerance. default=1.0e-3
%       5. 'droptol' = tolerance for dropping stored residual vectors to
%       improve conditioning: If 'droptol' > 0, drop residuals if the
%       condition number exceeds droptol; if droptol <= 0,
%       do not drop residuals.
%       6. 'beta' = damping factor: If 'beta' > 0 (and beta ~= 1), then
%       the step is damped by beta; otherwise, the step is not damped.
%       NOTE: 'beta' can be a function handle; form beta(iter), where iter
%       is the iteration number and 0 < beta(iter) <= 1.
%       7. 'AAstart' = acceleration delay factor: If 'AAstart' > 0, start
%       acceleration when iter = AAstart.
%
% Output:
% xS = Solution vector.
%
% The notation used is that of H.F. Walker: Anderson Acceleration:
% Algorithms and implementation
%------------------------------------------------------------------------
*)

mMax    := 100:
itmax   := 100:
atol    := 1.0e-8:
rtol    := 1.0e-6:
droptol := 1.0e12:
beta    := 1.0:
AAstart := 0:

# Initialize storage arrays and number of stored residuals.
DG := Matrix():
df := Matrix():
DGg := Vector(N);
QRg := Vector(N);
mAA := 0:

X0 := X;
for iter from 0 to itmax do

x:=X0:
gval := Vector(phi):
fval := gval - X0:
res_norm := norm(fval,2):

# Set the residual tolerance on the initial iteration.
if iter = 0 then
tol := max(atol,rtol*res_norm):
fi:

# Convergence test, if converged the loop stops.
if res_norm <= tol then
print('iter',iter,res_norm);
break;   # Breaks for-loop
fi:

# If resnorm is larger than 1e8 at iter > 5, problem stops
if res_norm >1e8 and iter > 5 then
print('no_convergence',res_norm);
break; # Breaks for-loop, diverged
fi:

# Fixed point iteration without acceleration, if mMax == 0.
if mMax = 0 or iter < AAstart then
# We update E <- g(E) to obtain the next approximate solution.
X0 := gval:
else
# With Anderson acceleration.
# Update the df vector and the DG array.
if iter > AAstart then
if mAA < mMax or Size(df,2) = 1 then
df := Concatenate(2,df,fval-f_old):
DG := Concatenate(2,DG,gval-g_old):
else
df := Concatenate(2,df[..,-1],fval-f_old):
DG := Concatenate(2,DG[..,-1],gval-g_old):
fi:
mAA := mAA + 1:
fi:   # iter
# We define the old g and f values for the next iteration
f_old := fval;
g_old := gval;

if mAA = 0 then
# Initialization
# If mAA == 0, update X <- g(X) to obtain themah next approximate
# solution. No least-squares problem is solved for iter = 0
X0 := gval;
else
if mAA > 1 then
Q,R := QRDecomposition(df,datatype=float,output='NAG');
if type(Q, 'Matrix'(square)) then
condDF := ConditionNumber(Q);
else
condDF := 1;
fi:
while condDF > droptol and mAA > 1 do
if mAA = 2 then
df := convert(df[..,2..-1],Vector);
DG := convert(DG[..,2..-1],Vector);
else
df := df[..,2..-1];
DG := DG[..,2..-1];
fi:
Q,R := QRDecomposition(df,datatype=float,output='NAG');
mAA := mAA - 1;
if type(Q, 'Matrix'(square)) then
condDF := ConditionNumber(Q);
else
condDF := 1;
fi:
od:
if Size(df,2) > 1 then
gamma := LeastSquares([Q,R],fval);
else
R := norm(df,2);
Q := R/~df;
gamma := R/~Transpose(Q).fval;
fi:

else
R := norm(df,2);
Q := R/~df;
gamma := R/~Transpose(Q).fval;
fi:

if Size(gamma,1) > 1 then
DGg:=DG.gamma:
else
DGg:=DG*gamma;
fi:

# Update the approximate solution.
X0 := gval - DGg;

# Damping for non-zero beta
if beta > 0 and beta <> 1 then
if mAA = 1 then
QRg := Q*R*gamma;
else
QRg := df.gamma;
fi:
X0 := X0 - (1-beta)*(fval - QRg);
fi:# isa(beta ...
fi: # mAA = 0
fi:# mMax == 0

od:
X[1..N] := X0[1..N];
end:

AndersonAcceleration(N,phix,X):

 1 2 Page 1 of 2
﻿