quantum

154 Reputation

4 Badges

19 years, 90 days

MaplePrimes Activity


These are replies submitted by quantum

by using your code (including the the above msm and ExtMSM tweaks) I ran again the whole Parametrize_SU_Euler procedure that now uses the optimized code. Unfortunately, even "only" 1000 8x8 matrices take 44 sec (where Matlab needs 11.76 sec for 10,000(!) 8x8 matrices). The profiling below shows that the main chunk is of course taken by the newly optimized matrix_exp() funtion which was called 63,000 times. So I guess the (relatively) poor performance is due to the many procedure calls and I cannot do anything further about that, right? Parametrize_SU_Euler := proc(N::posint, params) local lambda, used_lambdas, temp, i, j, m, l, k, A, alpha, X, param_list, param_ranges, U, a, temp1, temp2, s, RESMAT, workuaf, onevec, PROD, mmm, msm, old_hardware_floats; |Calls Seconds Words| PROC | 1000 44.001 297621258| 1 | 1000 0.000 1836| if params::list then 2 | 0 0.000 0| param_ranges := evalf(Feynman_parameters("SU","Euler angles",N)); 3 | 0 0.000 0| if not nops(param_ranges) = nops(params) then 4 | 0 0.000 0| error `: incorrect number of parameters! Expected a list of`, nops(param_ranges), ` parameters.` end if; 5 | 0 0.000 0| alpha := params elif params = "random" then 6 | 1000 0.921 681478| X := Statistics[RandomVariable](Uniform(0,1000)); 7 | 1000 0.609 5121203| alpha := convert(Statistics[Sample](X,N^2-1),list) else 8 | 0 0.000 0| error `: Either a list of numerical parameters is expected as second argument or the keyword "random".` end if; 9 | 1000 0.078 303927| lambda := Feynman_hermitian_basis(N,"float"); 10 | 1000 0.000 31963| j := m -> (N+m-1)*(N-m); 11 | 1000 0.031 749337| used_lambdas := Vector(N^2-1,datatype = integer); 12 | 1000 0.000 0| i := 1; 13 | 1000 0.000 0| for m from N by -1 to 2 do 14 | 7000 0.032 0| for k from 2 to m do 15 |28000 0.080 232101| used_lambdas[i] := 3; 16 |28000 0.062 242550| used_lambdas[i+1] := (k-1)^2+1; 17 |28000 0.000 0| i := i+2 end do end do; 18 | 1000 0.000 0| for m from 2 to N do 19 | 7000 0.000 113343| used_lambdas[i] := m^2-1; 20 | 7000 0.000 0| i := i+1 end do; 21 | 1000 0.047 1020696| a := Matrix(N,N,datatype = ('complex')('float')); 22 | 1000 0.079 821553| temp1 := Matrix(N,N,datatype = ('complex')('float')); 23 | 1000 0.031 820575| temp2 := Matrix(N,N,datatype = ('complex')('float')); 24 | 1000 0.000 822536| s := Matrix(N,N,datatype = ('complex')('float')); 25 | 1000 0.062 818949| RESMAT := Matrix(N,N,datatype = ('complex')('float')); 26 | 1000 0.062 402396| workuaf := Vector[row](N,('datatype') = ('float')); 27 | 1000 0.079 513260| onevec := Vector(N,('fill') = 1.0,('datatype') = ('complex')('float')); 28 | 1000 0.047 820535| PROD := Matrix(N,N,datatype = ('complex')('float')); 29 | 1000 0.406 3058690| U := LinearAlgebra:-IdentityMatrix(N,N,compact = false,outputoptions = [datatype = complex[8]]); 30 | 1000 0.030 862553| temp := Matrix(N,N,datatype = complex[8]); 31 | 1000 0.015 2097| kernelopts(opaquemodules = false); 32 | 1000 0.032 4258| mmm := LinearAlgebra:-LA_Main:-LA_External:-MatrixMatrixMultiply; 33 | 1000 0.000 1300| msm := LinearAlgebra:-LA_Main:-LA_External:-MatrixScalarMultiply; 34 | 1000 0.000 0| old_hardware_floats := UseHardwareFloats; 35 | 1000 0.000 14923| UseHardwareFloats := true; 36 | 1000 0.157 54| for k to op(1,used_lambdas) do 37 |63000 0.278 981181| ArrayTools:-Copy(N*N,lambda[used_lambdas[k]],0,1,temp,0,1); 38 |63000 2.325 8443036| msm(temp,I*alpha[k]); 39 |63000 36.270 248603941| matrix_exp(temp,a,temp1,temp2,s,workuaf,onevec,RESMAT); 40 |63000 2.096 21058532| mmm(U,RESMAT,PROD,('inplace') = false); 41 |63000 0.172 1072455| ArrayTools:-Copy(N*N,PROD,0,1,U,0,1) end do; 42 | 1000 0.000 0| UseHardwareFloats := old_hardware_floats; 43 | 1000 0.000 0| return U end proc
Wow, this was (once more) an impressive lesson. A big thank you! A short test of your code shows the enormous speed-up. I will still have to do my homework and go through the details but the impact of the main idea is clear. One question though: for your test code above (with oldmethod, newmethod := false, true) with n=64 and N=1000 my machine needs 8.891 sec. However, for N=5000 already 212 sec are needed? It's not obvious to me why it scales so strangely...!? I hope I will be able to transfer this approach to other bottlenecks in my program. I really appreciate the time and effort you put into solving my problem! As with many of your posts here in the forum, this has been very instructive for me! Thank you!
Wow, this was (once more) an impressive lesson. A big thank you! A short test of your code shows the enormous speed-up. I will still have to do my homework and go through the details but the impact of the main idea is clear. One question though: for your test code above (with oldmethod, newmethod := false, true) with n=64 and N=1000 my machine needs 8.891 sec. However, for N=5000 already 212 sec are needed? It's not obvious to me why it scales so strangely...!? I hope I will be able to transfer this approach to other bottlenecks in my program. I really appreciate the time and effort you put into solving my problem! As with many of your posts here in the forum, this has been very instructive for me! Thank you!
Thanks for these ideas. I'll have a look at the correspondig help pages to better understand what you mean. Mostly, I'm working under WinXP (32bit) which makes external calls a bit more complicated. Meanwhile, it tried to compile a BLAS.DLL library file that might be used under Windows (although I have never done that yet). But I have also a Ubuntu installation available. In principle, if external calls are really necessary, I would like to support Windows and Linux platforms if possible. Shouldn't that work if one delivers both a DLL library for windows and SO library for linux? UPDATE: By setting infolevel[LinearAlgebra]:=2 I checked again and found that NAG hardware precision routines are already used. Moreover this showed that the MatrixFunction command does some (probably avoidable?) copying: "MatrixFunction: copying, to recover complex(float) datatype" Is this the overhead you were referring to? BTW: How can one find all those undocumented low-level procedures? I just learned about kernelopts(opaquemodules=false); showstat( ... ); But typically I don't know the names of these, procedures. So where can I find them?
Thanks for these ideas. I'll have a look at the correspondig help pages to better understand what you mean. Mostly, I'm working under WinXP (32bit) which makes external calls a bit more complicated. Meanwhile, it tried to compile a BLAS.DLL library file that might be used under Windows (although I have never done that yet). But I have also a Ubuntu installation available. In principle, if external calls are really necessary, I would like to support Windows and Linux platforms if possible. Shouldn't that work if one delivers both a DLL library for windows and SO library for linux? UPDATE: By setting infolevel[LinearAlgebra]:=2 I checked again and found that NAG hardware precision routines are already used. Moreover this showed that the MatrixFunction command does some (probably avoidable?) copying: "MatrixFunction: copying, to recover complex(float) datatype" Is this the overhead you were referring to? BTW: How can one find all those undocumented low-level procedures? I just learned about kernelopts(opaquemodules=false); showstat( ... ); But typically I don't know the names of these, procedures. So where can I find them?
Hi acer, yes, the lambda[i] matrices are created with datatype=complex[8]. A basically equivalent code in Matlab 7.5 (for the creation of the whole SU matrix, not only the matrix exponentials!) takes 11.76 sec for 10,000 (random) 8x8 SU matrices. 10,000 16x16 matrices already take 92.09 sec. 1,000 (not 10,000) 32x32 random matrices take 133.60 sec. All timings done under WinXP (32bit) on a CoreDuo laptop (2x2.33GHz). In Matlab, multithreading was enabled (doesn't make a difference, though). It seems that 64x64 will stay unrealistic as it takes 44.68 sec for only 10 64x64 matrices... Of course, I'm curious what you suggest to improve the Maple code as the above Maple implementation is orders of magnitude slower: 13.047 sec for only 100(!) 8x8 SU matrices and 58.922 sec for 100 16x16. PS: Generally, the lambda[i] matrices do not commute.
Hi acer, yes, the lambda[i] matrices are created with datatype=complex[8]. A basically equivalent code in Matlab 7.5 (for the creation of the whole SU matrix, not only the matrix exponentials!) takes 11.76 sec for 10,000 (random) 8x8 SU matrices. 10,000 16x16 matrices already take 92.09 sec. 1,000 (not 10,000) 32x32 random matrices take 133.60 sec. All timings done under WinXP (32bit) on a CoreDuo laptop (2x2.33GHz). In Matlab, multithreading was enabled (doesn't make a difference, though). It seems that 64x64 will stay unrealistic as it takes 44.68 sec for only 10 64x64 matrices... Of course, I'm curious what you suggest to improve the Maple code as the above Maple implementation is orders of magnitude slower: 13.047 sec for only 100(!) 8x8 SU matrices and 58.922 sec for 100 16x16. PS: Generally, the lambda[i] matrices do not commute.
Thank you for your interest in my problem. Let me answer your questions: - The typical sizes of the matrices will be in the range 8x8 to 64x64. - When using some higher-lever commands this results in quite many (can be many thousands e.g. in optimisation problems) calls of the lower level procedures (see example profiling below) - The Feynman_permutation_matrix() command (which is the same as Permutation_matrix(), just forgot to rename it when pasting here) seems to be less critical. It is often called with the same arguments and therefore I use the 'remember' option. As for the Partial_transpose() command: Below you find the profiling after some more or less real-life application. One can see it has been called rather often... Partial_trace := proc(rho::('Matrix')(square), d::list(posint), trace_list::list(posint)) local rho_dim, i, j, k, d_in, d_out, in_list, out_list, U, reordered_rho, new_rho; |Calls Seconds Words| PROC |12000 12.460 107208277| 1 |12000 0.016 54220| rho_dim := mul(d[x],x = 1 .. nops(d)); 2 |12000 0.092 145846| if not LinearAlgebra[RowDimension](rho) = rho_dim then 3 | 0 0.000 0| error `: The given matrix does not match the specified subspace dimensions.` end if; 4 |12000 0.016 28394| if nops(d) < max(op(trace_list)) then 5 | 0 0.000 0| error `: One (or more) of the specified target subspaces is invalid.` end if; 6 |12000 0.124 366797| out_list, in_list := selectremove(has,[seq(1 .. nops(d))],trace_list); 7 |12000 0.000 217701| d_in := mul(d[i],i = in_list); 8 |12000 0.000 169443| d_out := mul(d[i],i = out_list); 9 |12000 0.031 225276| if not [op(in_list), op(out_list)] = [seq(i,i = 1 .. nops(d))] then 10 | 8000 0.575 2826135| if rho[1,1]::complex[8] then 11 | 8000 0.811 12085671| U := Matrix(rho_dim,rho_dim,Feynman_permutation_matrix([op(in_list), op(out_list)],d),datatype = complex[8]); 12 | 8000 0.420 4633864| new_rho := Matrix(d_in,d_in,datatype = complex[8]) else 13 | 0 0.000 0| U := Feynman_permutation_matrix([op(in_list), op(out_list)],d); 14 | 0 0.000 0| new_rho := Matrix(d_in,d_in) end if; 15 | 8000 4.582 55225765| reordered_rho := LinearAlgebra:-LA_Main:-MatrixMatrixMultiply(LinearAlgebra:-LA_Main:-MatrixMatrixMultiply(U,rho,inplace = false,outputoptions = []),LinearAlgebra:-LA_Main:-HermitianTranspose(U,inplace = false,outputoptions = []),inplace = false,outputoptions = []) else 16 | 4000 0.000 0| reordered_rho := rho; 17 | 4000 1.411 3122064| if rho::('Matrix')(complex[8]) then 18 | 4000 0.171 2840927| new_rho := Matrix(d_in,d_in,datatype = complex[8]) else 19 | 0 0.000 0| new_rho := Matrix(d_in,d_in) end if end if; 20 |12000 0.031 0| for i to d_in do 21 |36000 0.142 0| for j to d_in do 22 |120000 4.038 25266174| new_rho[i,j] := add(reordered_rho[(i-1)*d_out+k,(j-1)*d_out+k],k = 1 .. d_out) end do end do; 23 |12000 0.000 0| return new_rho end proc - Specifically in Parametrize_SU_Euler() (but also in other procedures) I need to calculate matrix functions. This is rather expensive but not so easy to do it by simple external calls, right!? Here's a profiling from a very similar application as the profiling above. Here, only 200 calls occured but when carrying out some optimisation problem over the set of unitary matrices (which are parametrized by Parametrize_SU_Euler) then _easily_ many thousand calls occur (which is of course much to expensive...) ...sorry for the ugly line breaks... Parametrize_SU_Euler := proc(N::posint, params) local lambda, used_lambdas, temp, i, j, m, l, k, A, alpha, X, param_list, param_ranges, U; |Calls Seconds Words| PROC | 200 40.805 361206053| 1 | 200 0.000 255| if params::list then 2 | 200 0.015 266569| param_ranges := evalf(Feynman_parameters("SU","Euler angles",N)); 3 | 200 0.000 1106| if not nops(param_ranges) = nops(params) then 4 | 0 0.000 0| error `: incorrect number of parameters! Expected a list of`, nops(param_ranges), ` parameters.` end if; 5 | 200 0.000 0| alpha := params elif params = "random" then 6 | 0 0.000 0| X := Statistics[RandomVariable](Uniform(0,1000)); 7 | 0 0.000 0| alpha := convert(Statistics[Sample](X,N^2-1),list) else 8 | 0 0.000 0| error `: Either a list of numerical parameters is expected as second argument or the keyword "random".` end if; 9 | 200 0.109 597316| lambda := Feynman_hermitian_basis(N,"float"); 10 | 200 0.000 2602| j := m -> (N+m-1)*(N-m); 11 | 200 0.016 138841| used_lambdas := Vector(N^2-1,datatype = integer); 12 | 200 0.000 0| i := 1; 13 | 200 0.000 0| for m from N by -1 to 2 do 14 | 1800 0.016 0| for k from 2 to m do 15 | 9000 0.064 62752| used_lambdas[i] := 3; 16 | 9000 0.000 64130| used_lambdas[i+1] := (k-1)^2+1; 17 | 9000 0.000 0| i := i+2 end do end do; 18 | 200 0.000 0| for m from 2 to N do 19 | 1800 0.000 20392| used_lambdas[i] := m^2-1; 20 | 1800 0.000 0| i := i+1 end do; 21 | 200 0.000 585435| temp := LinearAlgebra:-LA_Main:-MatrixScalarMultiply(lambda[used_lambdas[1]],I*alpha[1],inplace = false,outputoptions = [datatype = complex[8]]); 22 | 200 0.407 2661130| U := LinearAlgebra:-LA_Main:-MatrixFunction(temp,exp(dummy),dummy,outputoptions = [datatype = complex[8]]); 23 | 200 0.031 105| for k from 2 to op(1,used_lambdas) do 24 |19600 5.127 51781164| temp := LinearAlgebra:-LA_Main:-MatrixScalarMultiply(lambda[used_lambdas[k]],I*alpha[k],inplace = false,outputoptions = []); 25 |19600 35.020 305024256| U := LinearAlgebra:-LA_Main:-MatrixMatrixMultiply(U,LinearAlgebra:-LA_Main:-MatrixFunction(temp,exp(dummy),dummy,outputoptions = []),inplace = false,outputoptions = []) end do end proc - Some rough empirical tests seemed to show that garbage collection works best at gcfreq=200000000. - If you think it pays off to call BLAS, LAPACK, whatever routines then I'm willing to learn how to do it. But, currently I mainly use Windows where I first would have to create the proper DLLs. How complicated would it be to offer such a external calling solution for Windows and Linux (if possible I would like to support both platforms). But -as you pointed out already- LinearAlgebra uses already external routines. So I wonder if there is really so much room for improvements!? EDIT: I should mention that the parametrize_SU command above is actually already the result of a forum topic here (see http://www.mapleprimes.com/forum/efficient-parametrization-of-unitary-matrices ). Once again: Thank you for your support!
Thank you for your interest in my problem. Let me answer your questions: - The typical sizes of the matrices will be in the range 8x8 to 64x64. - When using some higher-lever commands this results in quite many (can be many thousands e.g. in optimisation problems) calls of the lower level procedures (see example profiling below) - The Feynman_permutation_matrix() command (which is the same as Permutation_matrix(), just forgot to rename it when pasting here) seems to be less critical. It is often called with the same arguments and therefore I use the 'remember' option. As for the Partial_transpose() command: Below you find the profiling after some more or less real-life application. One can see it has been called rather often... Partial_trace := proc(rho::('Matrix')(square), d::list(posint), trace_list::list(posint)) local rho_dim, i, j, k, d_in, d_out, in_list, out_list, U, reordered_rho, new_rho; |Calls Seconds Words| PROC |12000 12.460 107208277| 1 |12000 0.016 54220| rho_dim := mul(d[x],x = 1 .. nops(d)); 2 |12000 0.092 145846| if not LinearAlgebra[RowDimension](rho) = rho_dim then 3 | 0 0.000 0| error `: The given matrix does not match the specified subspace dimensions.` end if; 4 |12000 0.016 28394| if nops(d) < max(op(trace_list)) then 5 | 0 0.000 0| error `: One (or more) of the specified target subspaces is invalid.` end if; 6 |12000 0.124 366797| out_list, in_list := selectremove(has,[seq(1 .. nops(d))],trace_list); 7 |12000 0.000 217701| d_in := mul(d[i],i = in_list); 8 |12000 0.000 169443| d_out := mul(d[i],i = out_list); 9 |12000 0.031 225276| if not [op(in_list), op(out_list)] = [seq(i,i = 1 .. nops(d))] then 10 | 8000 0.575 2826135| if rho[1,1]::complex[8] then 11 | 8000 0.811 12085671| U := Matrix(rho_dim,rho_dim,Feynman_permutation_matrix([op(in_list), op(out_list)],d),datatype = complex[8]); 12 | 8000 0.420 4633864| new_rho := Matrix(d_in,d_in,datatype = complex[8]) else 13 | 0 0.000 0| U := Feynman_permutation_matrix([op(in_list), op(out_list)],d); 14 | 0 0.000 0| new_rho := Matrix(d_in,d_in) end if; 15 | 8000 4.582 55225765| reordered_rho := LinearAlgebra:-LA_Main:-MatrixMatrixMultiply(LinearAlgebra:-LA_Main:-MatrixMatrixMultiply(U,rho,inplace = false,outputoptions = []),LinearAlgebra:-LA_Main:-HermitianTranspose(U,inplace = false,outputoptions = []),inplace = false,outputoptions = []) else 16 | 4000 0.000 0| reordered_rho := rho; 17 | 4000 1.411 3122064| if rho::('Matrix')(complex[8]) then 18 | 4000 0.171 2840927| new_rho := Matrix(d_in,d_in,datatype = complex[8]) else 19 | 0 0.000 0| new_rho := Matrix(d_in,d_in) end if end if; 20 |12000 0.031 0| for i to d_in do 21 |36000 0.142 0| for j to d_in do 22 |120000 4.038 25266174| new_rho[i,j] := add(reordered_rho[(i-1)*d_out+k,(j-1)*d_out+k],k = 1 .. d_out) end do end do; 23 |12000 0.000 0| return new_rho end proc - Specifically in Parametrize_SU_Euler() (but also in other procedures) I need to calculate matrix functions. This is rather expensive but not so easy to do it by simple external calls, right!? Here's a profiling from a very similar application as the profiling above. Here, only 200 calls occured but when carrying out some optimisation problem over the set of unitary matrices (which are parametrized by Parametrize_SU_Euler) then _easily_ many thousand calls occur (which is of course much to expensive...) ...sorry for the ugly line breaks... Parametrize_SU_Euler := proc(N::posint, params) local lambda, used_lambdas, temp, i, j, m, l, k, A, alpha, X, param_list, param_ranges, U; |Calls Seconds Words| PROC | 200 40.805 361206053| 1 | 200 0.000 255| if params::list then 2 | 200 0.015 266569| param_ranges := evalf(Feynman_parameters("SU","Euler angles",N)); 3 | 200 0.000 1106| if not nops(param_ranges) = nops(params) then 4 | 0 0.000 0| error `: incorrect number of parameters! Expected a list of`, nops(param_ranges), ` parameters.` end if; 5 | 200 0.000 0| alpha := params elif params = "random" then 6 | 0 0.000 0| X := Statistics[RandomVariable](Uniform(0,1000)); 7 | 0 0.000 0| alpha := convert(Statistics[Sample](X,N^2-1),list) else 8 | 0 0.000 0| error `: Either a list of numerical parameters is expected as second argument or the keyword "random".` end if; 9 | 200 0.109 597316| lambda := Feynman_hermitian_basis(N,"float"); 10 | 200 0.000 2602| j := m -> (N+m-1)*(N-m); 11 | 200 0.016 138841| used_lambdas := Vector(N^2-1,datatype = integer); 12 | 200 0.000 0| i := 1; 13 | 200 0.000 0| for m from N by -1 to 2 do 14 | 1800 0.016 0| for k from 2 to m do 15 | 9000 0.064 62752| used_lambdas[i] := 3; 16 | 9000 0.000 64130| used_lambdas[i+1] := (k-1)^2+1; 17 | 9000 0.000 0| i := i+2 end do end do; 18 | 200 0.000 0| for m from 2 to N do 19 | 1800 0.000 20392| used_lambdas[i] := m^2-1; 20 | 1800 0.000 0| i := i+1 end do; 21 | 200 0.000 585435| temp := LinearAlgebra:-LA_Main:-MatrixScalarMultiply(lambda[used_lambdas[1]],I*alpha[1],inplace = false,outputoptions = [datatype = complex[8]]); 22 | 200 0.407 2661130| U := LinearAlgebra:-LA_Main:-MatrixFunction(temp,exp(dummy),dummy,outputoptions = [datatype = complex[8]]); 23 | 200 0.031 105| for k from 2 to op(1,used_lambdas) do 24 |19600 5.127 51781164| temp := LinearAlgebra:-LA_Main:-MatrixScalarMultiply(lambda[used_lambdas[k]],I*alpha[k],inplace = false,outputoptions = []); 25 |19600 35.020 305024256| U := LinearAlgebra:-LA_Main:-MatrixMatrixMultiply(U,LinearAlgebra:-LA_Main:-MatrixFunction(temp,exp(dummy),dummy,outputoptions = []),inplace = false,outputoptions = []) end do end proc - Some rough empirical tests seemed to show that garbage collection works best at gcfreq=200000000. - If you think it pays off to call BLAS, LAPACK, whatever routines then I'm willing to learn how to do it. But, currently I mainly use Windows where I first would have to create the proper DLLs. How complicated would it be to offer such a external calling solution for Windows and Linux (if possible I would like to support both platforms). But -as you pointed out already- LinearAlgebra uses already external routines. So I wonder if there is really so much room for improvements!? EDIT: I should mention that the parametrize_SU command above is actually already the result of a forum topic here (see http://www.mapleprimes.com/forum/efficient-parametrization-of-unitary-matrices ). Once again: Thank you for your support!
As you had suspected, I profiled over a few calls with larger N instead of many calls with N=5 (which is more realistic). Your last two versions are very insightful for me (as the whole thread indeed). I think for my purposes, I will stick to the second version (where you eliminated the second Sample call...) but with LA_Main commands. This seems to be a very acceptable compromise between simplicity/readibilty and performance! I really learned several lessons from your posts and I hope I will be able to make use of that other places as well. So thank you very much!
As you had suspected, I profiled over a few calls with larger N instead of many calls with N=5 (which is more realistic). Your last two versions are very insightful for me (as the whole thread indeed). I think for my purposes, I will stick to the second version (where you eliminated the second Sample call...) but with LA_Main commands. This seems to be a very acceptable compromise between simplicity/readibilty and performance! I really learned several lessons from your posts and I hope I will be able to make use of that other places as well. So thank you very much!
I see the logic behind your modification but this change yields only a marginal perfomance improvement. Profiling shows that by far most of the time is used for the HermitianTranspose and then by Sample as seen from the profiling summary below: Feynman_random_rho Feynman_random_rho := proc(N::posint, d::posint := 2) local rho, temp, X, a, rtemp, temp2; |Calls Seconds Words| PROC | 6 2.845 31926325| 1 | 6 0.000 4194| X := Statistics:-RandomVariable(Normal(0,1)); 2 | 6 0.047 6295108| temp := Matrix(d^N,d^N,datatype = complex[8]); 3 | 6 0.469 6311128| a := Statistics:-Sample(X,2*d^N*d^N); 4 | 6 0.000 297| rtemp := ArrayTools:-ComplexAsFloat(temp); 5 | 6 0.015 0| ArrayTools:-Copy(2*d^N*d^N,a,0,1,rtemp,0,1); 6 | 6 2.267 18912860| rho := LinearAlgebra[HermitianTranspose](temp).temp; 7 | 6 0.047 402738| LinearAlgebra:-MatrixScalarMultiply(rho,1/LinearAlgebra[Trace](rho),inplace = true) end proc So I guess, this is close to the optimum one can get within Maple...!?
I see the logic behind your modification but this change yields only a marginal perfomance improvement. Profiling shows that by far most of the time is used for the HermitianTranspose and then by Sample as seen from the profiling summary below: Feynman_random_rho Feynman_random_rho := proc(N::posint, d::posint := 2) local rho, temp, X, a, rtemp, temp2; |Calls Seconds Words| PROC | 6 2.845 31926325| 1 | 6 0.000 4194| X := Statistics:-RandomVariable(Normal(0,1)); 2 | 6 0.047 6295108| temp := Matrix(d^N,d^N,datatype = complex[8]); 3 | 6 0.469 6311128| a := Statistics:-Sample(X,2*d^N*d^N); 4 | 6 0.000 297| rtemp := ArrayTools:-ComplexAsFloat(temp); 5 | 6 0.015 0| ArrayTools:-Copy(2*d^N*d^N,a,0,1,rtemp,0,1); 6 | 6 2.267 18912860| rho := LinearAlgebra[HermitianTranspose](temp).temp; 7 | 6 0.047 402738| LinearAlgebra:-MatrixScalarMultiply(rho,1/LinearAlgebra[Trace](rho),inplace = true) end proc So I guess, this is close to the optimum one can get within Maple...!?
Thank you, acer! I still have to learn all the lessons included in your post, but the speed-up is really impressive. I will go through the details and the according help pages. I'm really learning a lot from this forum. Thanks again to all the gurus here who spend their time on explaining those things over and over again to us newbies :-) It is appreciated!
Thank you, acer! I still have to learn all the lessons included in your post, but the speed-up is really impressive. I will go through the details and the according help pages. I'm really learning a lot from this forum. Thanks again to all the gurus here who spend their time on explaining those things over and over again to us newbies :-) It is appreciated!
1 2 3 4 Page 2 of 4