acer

32348 Reputation

29 Badges

19 years, 330 days
Ontario, Canada

Social Networks and Content at Maplesoft.com

MaplePrimes Activity


These are replies submitted by acer

It worked fine for me in 12.02, Windows 32bit  (XP).

acer

It worked fine for me in 12.02, Windows 32bit  (XP).

acer

I had written myself a similar procedure, this evening. But then I noticed that it had an unfortunate behaviour, shared with your submission. Namely what happens to `z` below,

> restart;
> RemoveAllAssumptions:=proc() local A,Ls,L,Lc,indx;
>    Ls:=map(convert,[anames('user')],string);
>    L:=ListTools:-Enumerate([anames('user')]);
>    Lc:=select(hastype,eval(L),`local`);
>    if Lc=[] then NULL
>    else
>       indx:=map2(op,1,Lc);
>       A:=[seq(Ls[i],i=indx)];
>       unassign(op(map(parse,A)));
>    end if
> end proc:

> z:=y;
                               y

> assume(x>0);

> y:=x;
                               x

> z;
                               x

> RemoveAllAssumptions();

> z;
                               z

> y;
                               y

I could see that one might reasonably argue that, after RemoveAllAssumptions is called above, y is unassigned. (How would the procedure know, otherwise, whether to make y take as some new value the global x, or some procedure's escaped local x, etc?) But it doesn't look right to me that z gets unassigned in the above example.

I'm sure that a robust version, which somehow preserves the desired previous assignments, can be written. But offhand I could only see kludgy ways to handle some stranger cases.

Of course, your routine may suffice for the Orginal Poster. (Although, maybe his assumed names are all simply indexed and some loop would also do.)

acer

I had written myself a similar procedure, this evening. But then I noticed that it had an unfortunate behaviour, shared with your submission. Namely what happens to `z` below,

> restart;
> RemoveAllAssumptions:=proc() local A,Ls,L,Lc,indx;
>    Ls:=map(convert,[anames('user')],string);
>    L:=ListTools:-Enumerate([anames('user')]);
>    Lc:=select(hastype,eval(L),`local`);
>    if Lc=[] then NULL
>    else
>       indx:=map2(op,1,Lc);
>       A:=[seq(Ls[i],i=indx)];
>       unassign(op(map(parse,A)));
>    end if
> end proc:

> z:=y;
                               y

> assume(x>0);

> y:=x;
                               x

> z;
                               x

> RemoveAllAssumptions();

> z;
                               z

> y;
                               y

I could see that one might reasonably argue that, after RemoveAllAssumptions is called above, y is unassigned. (How would the procedure know, otherwise, whether to make y take as some new value the global x, or some procedure's escaped local x, etc?) But it doesn't look right to me that z gets unassigned in the above example.

I'm sure that a robust version, which somehow preserves the desired previous assignments, can be written. But offhand I could only see kludgy ways to handle some stranger cases.

Of course, your routine may suffice for the Orginal Poster. (Although, maybe his assumed names are all simply indexed and some loop would also do.)

acer

One useful technique is to re-use float[8] (or other hardware datatype) Matrix/Vector/Array workspaces.

This can save on the total memory allocation needed.

There is a useful command, ArrayTools:-Fill, which can be utilized to zero-out an existing Matrix/Vector/Array.

Things can get more complicated when the individual tasks require varying sizes of workspace. In such cases in can still be useful and efficient to re-use a single instance of the workspace. Of course, in such a scenario the code may have to handle the fact that the workspace is larger than necessary (see Historical Note below).

When using separate workspace objects then the memory performance and usage can depend on whether or not one creates & unassigns one's workspaces in order of descending size. The following comparison illustrates that the worse scenario is to create then in order of ascending size, because doing so can involve much more total memory allocation (due to the mentioned problem of "memory fragmentation").

Case of generating separate workspaces, in order of increaing size,

> restart:
> for i from 1 to 10 do
>   M:=Matrix(2000+i*100,datatype=float[8]):
>   # some task involving a clean M
>   M:='M': dummy1: dummy2: dummy3:
>   gc():
> end do:
> kernelopts(bytesalloc);
                           527992392

Case of generating separate workspaces, in order of decreasing size,

> restart:
> for i from 1 to 10 do
>   M:=Matrix(3000-i*100,datatype=float[8]):
>   # some task involving a clean M
>   M:='M': dummy1: dummy2: dummy3:
>   gc():
> end do:
kernelopts(bytesalloc);
                            68144960

Case of re-using single (largest required) workspace,

> restart:
> M:=Matrix(2900,datatype=float[8]): # largest needed
> for i from 1 to 10 do
>   # some task involving a clean M
>   ArrayTools:-Fill(2900*2900,0.0,M);
>   # no `gc` needed to clear out M!
> end do:
> kernelopts(bytesalloc);
                            68144960

Historical Note: Old-timer Fortran programmers would often write code in which single larger-than-always-necessary blocks of memory were allocated up front. The size would depend on the largest size that would ever be required. That large full block would be used and re-used within the individual tasks. Indeed, a lot of classic code like LAPACK is implemented in this style. Assuming Fortran-order storage, many LAPACK functions have associated with each array argument a triple of scalars to denote the size. For example, 2D array A would have `lda`, `m`, and `n'. The `lda` meant the leading dimension of A, or in other words the height of columns since `Fortran order` means `stored by column`. The `n` meant number of columns and could be used naturally. But the `m` meant the height of the data for the given task, and its value could be much less than `lda`. In this scheme the data for each individual task could be stored and accessed in the upper portion of the (much larger than necessary) full array block. The functions were written to know how to use `lda` and `m` as appropriate, according to the stride needed to jump from one column to another of the actual data subblock actually in use. One would use `m` as the number of entries of any particular column, but the code would use `lda` as the number of places to jump in order to walk along a particular row.

  | A[1,1]  ..  A[1,j]  ..   A[1,n] |
  |   .           .            .    |
  |   .           .            .    |
  | A[i,1]  ..  A[i,j]  ..   A[i,n] |
  |   .           .            .    |
  |   .           .            .    |
  | A[m,1]  ..  A[m,j]  ..   A[m,n] |
  |   .                             |
  |   .                             |
  |   .                             |
  | A[lda,1]  ..        .. A[lda,n] |

In the above example, A would declared as size (lda,n).

acer

One useful technique is to re-use float[8] (or other hardware datatype) Matrix/Vector/Array workspaces.

This can save on the total memory allocation needed.

There is a useful command, ArrayTools:-Fill, which can be utilized to zero-out an existing Matrix/Vector/Array.

Things can get more complicated when the individual tasks require varying sizes of workspace. In such cases in can still be useful and efficient to re-use a single instance of the workspace. Of course, in such a scenario the code may have to handle the fact that the workspace is larger than necessary (see Historical Note below).

When using separate workspace objects then the memory performance and usage can depend on whether or not one creates & unassigns one's workspaces in order of descending size. The following comparison illustrates that the worse scenario is to create then in order of ascending size, because doing so can involve much more total memory allocation (due to the mentioned problem of "memory fragmentation").

Case of generating separate workspaces, in order of increaing size,

> restart:
> for i from 1 to 10 do
>   M:=Matrix(2000+i*100,datatype=float[8]):
>   # some task involving a clean M
>   M:='M': dummy1: dummy2: dummy3:
>   gc():
> end do:
> kernelopts(bytesalloc);
                           527992392

Case of generating separate workspaces, in order of decreasing size,

> restart:
> for i from 1 to 10 do
>   M:=Matrix(3000-i*100,datatype=float[8]):
>   # some task involving a clean M
>   M:='M': dummy1: dummy2: dummy3:
>   gc():
> end do:
kernelopts(bytesalloc);
                            68144960

Case of re-using single (largest required) workspace,

> restart:
> M:=Matrix(2900,datatype=float[8]): # largest needed
> for i from 1 to 10 do
>   # some task involving a clean M
>   ArrayTools:-Fill(2900*2900,0.0,M);
>   # no `gc` needed to clear out M!
> end do:
> kernelopts(bytesalloc);
                            68144960

Historical Note: Old-timer Fortran programmers would often write code in which single larger-than-always-necessary blocks of memory were allocated up front. The size would depend on the largest size that would ever be required. That large full block would be used and re-used within the individual tasks. Indeed, a lot of classic code like LAPACK is implemented in this style. Assuming Fortran-order storage, many LAPACK functions have associated with each array argument a triple of scalars to denote the size. For example, 2D array A would have `lda`, `m`, and `n'. The `lda` meant the leading dimension of A, or in other words the height of columns since `Fortran order` means `stored by column`. The `n` meant number of columns and could be used naturally. But the `m` meant the height of the data for the given task, and its value could be much less than `lda`. In this scheme the data for each individual task could be stored and accessed in the upper portion of the (much larger than necessary) full array block. The functions were written to know how to use `lda` and `m` as appropriate, according to the stride needed to jump from one column to another of the actual data subblock actually in use. One would use `m` as the number of entries of any particular column, but the code would use `lda` as the number of places to jump in order to walk along a particular row.

  | A[1,1]  ..  A[1,j]  ..   A[1,n] |
  |   .           .            .    |
  |   .           .            .    |
  | A[i,1]  ..  A[i,j]  ..   A[i,n] |
  |   .           .            .    |
  |   .           .            .    |
  | A[m,1]  ..  A[m,j]  ..   A[m,n] |
  |   .                             |
  |   .                             |
  |   .                             |
  | A[lda,1]  ..        .. A[lda,n] |

In the above example, A would declared as size (lda,n).

acer

Now I'm all keen to know whether the answer is not just something obvious like -315.19 kJ/mol

Please let us know, when you find out.

acer

Now I'm all keen to know whether the answer is not just something obvious like -315.19 kJ/mol

Please let us know, when you find out.

acer

@PatrickT Your large test is no longer appropriate, in the sense that the inputs are floats. LinearAlgebra converts that to float[8] and bang, is done. Anyone can take the pure loop version, Compile it after typing it, and get similar. Or they could write a precompiled external routine for this (like LinearAlgebra has). There's nothing wrong with floats, but it's an apple and oranges situation.

It shouldn't be surprising that the result object's size may be very different according to whether it is datatype=anything or not. It makes a difference1

The versions users posted are all (obviously, I say) designed for the exact case (like your symbolic example).

Also, did you try the new submission KPKP above, on your example that for which you wrote that nothing else succeeded? On my i7 it took 7 secs and the memory fit in about 250Mb Ithink it was. Of course. it too should be altered for float/float[8].

An appropriate large test/comparison of the original posted codes would be exact or symbolic.also, you could reasonably use routine `KP` in an above response since that is like LinearAlgebra's exact code in KroneckerProduct.

Alternatively, large float tests could be done on new submissions tailored for that type. Apples to apples. Oranges to oranges.

You could also use kernelopts(bytesalloc) to compute a memory difference, ie an allocation increase incurred for each operation.

@hirnyk Yes, it is important for several reasons.

  • Just because you cannot imagine a situation where highly repeated use of a low-level function is necessary does not in any way mean that no situation would likely ever require it. There are many more programming situations than you or I or anyone else here can imagine. That is why programming is so important to Maple (and, incidentally, why it should be treated as much, much more important that program-free "Clickable Math").
  • The procedure in question is simple and basic. So the issue of how it may be optimized is very general, and the answers to that in turn are very likely broadly applicable. Observe how the very fastest code solutions to this question rely on kernel built-ins and inplace semantics. Observe how slower code solutions mostly involve a lot of temporary garbage production. There are important, re-usable lessons to be learnt from all this.
  • As Patrick expressed, such comparisons are also useful for learning Maple.

@Alejandro Jakubi Not if it helps provide an immediate workaround to resolve the issue at hand for the given individual.

@Alejandro Jakubi Not if it helps provide an immediate workaround to resolve the issue at hand for the given individual.

@Alec Mihailovs This is very interesting. I did a quick check on 32bit Linux, with comparable results. I did all tests on local hardisk, under /tmp.

I also tested fread'ind into a 2^26 length Array in the case that the file only had 5 bytes previously written to it in total. The `fread1` command did the right thing, and did not introduce garbage into the entries past the 5th position.

At length 2^26 the timings differed between fread1 and readbytes was about a factor of one hundred. Very impressive. It's not even clear that the result isn't even better than that, since the fread1 timing varies a bit and is so small that it maybe mostly noise.

It was mostly in the range of sizes 2^22 through 2^24 that readbytes started to slow down. It performed fine at, say 2^18.

One thing that I noticed (Linux 32bit) was that at size 2^26 the readbytes command took this amount of time in various Maple releases:

Maple 14:  5.240 sec

Maple 13.01: 5.644 sec

Maple 12.02: 5.112 sec

Maple 11.02: 1.832 sec

Maple 10.06: 1.552 sec

But 0.05 sec is still much better than 1.552 sec.

I used this code below, in a file. The file "aaa" must first be written out to currentdir(), and contain the data, naturally. Eg, writebytes("aaa",[3,4,5,6,7]) for the "short test".

### cut here ### snip 8< ### check.mpl ###

restart:
kernelopts(printbytes=false):

str,st,ba,bu:=
time[real](),time(),kernelopts(bytesalloc),kernelopts(bytesused):
B:=Array(1..2^26,98,datatype=integer[1]):
time[real]()-str,time()-st,
kernelopts(bytesalloc)-ba,kernelopts(bytesused)-bu;

str,st,ba,bu:=
time[real](),time(),kernelopts(bytesalloc),kernelopts(bytesused):
readbytes("aaa",B):
time[real]()-str,time()-st,
kernelopts(bytesalloc)-ba,kernelopts(bytesused)-bu;

Vector[row](B[1..10]),Vector[row](B[-10..-1]);

restart:
kernelopts(printbytes=false):

fopen1:=define_external('fopen',
'filename'::string,
'mode'::string,
'RETURN'::integer[4],
'LIB'="libc.so.6"):
fread1:=define_external('fread',
'ptr'::REF(ARRAY(datatype=integer[1])),
'size'::integer[4],
'count'::integer[4],
'stream'::integer[4],
'RETURN'::integer[4],
'LIB'="libc.so.6"):
fclose1:=define_external('fclose',
'stream'::integer[4],
'RETURN'::integer[4],
'LIB'="libc.so.6"):

B:=Array(1..2^26,98,datatype=integer[1]):

str,st,ba,bu:=
time[real](),time(),kernelopts(bytesalloc),kernelopts(bytesused):
p:=fopen1("aaa","rb"):
fread1(B,1,2^26,p):
fclose1(p):
time[real]()-str,time()-st,
kernelopts(bytesalloc)-ba,kernelopts(bytesused)-bu;

Vector[row](B[1..10]),Vector[row](B[-10..-1]);

### cut here ### snip 8< ### end of file ###

ps. It can be handy to use integer[kernelopts('wordsize')/8] although Windows 64 sometimes makes that problematic.

It's probably worth mentioning that calling readbytes("aaa",B,N) for N the size didn't seem to speed it up.

So, now I wonder how fast endianness can be accomodated (in a C wrapper, obviously) for wider formats. And I wonder about ImageTools:-Read and ImportMatrix:-ReadBinaryFile. It's hard to tell, since as I am sure you know iolib(4,..) is a kernel built-in.

acer

reality := realities[reality];

@herclau You don't have to explain to me the use of SVD for Least Squares solving.

Look:

> y.U.Ss.V;

[0.00520590831446885, 1.00159598094724, 0.999421478032667]

> V^%T.Ss.U^%T.y;

[0.00520590831446883]
[ ]
[ 1.00159598094724]
[ ]
[ 0.999421478032667]

Those two above are just transpositions of each other. Earlier, I suggested you use y.U.Ss.V. But of course you can use V^%T.Ss.U^%T.y as well, because they are just transpositions of each other! I simply chose the version which required no additional cost of actually doing transpositon.

You constructed Ss as the inverse of DiagonalMatrix(S[1 .. 3]) so of course Ss is symmetric. And so when multiplying y by V^%T.Ss.U^%T you are (least squares) solving M.x=y.

In your P2 you actually used V.U^%T.y.Ss which is no good. I don't know where you get that formula from.

Remember, SingularValues returns Vt which is the transpose of V. Forgetting that is not enough to generate your strange formula used in your P2, I think.

For fun (but perhaps not optimally using the 'thin' option, I didn't check), since Maple uses SVD to compute the non-square float pseudo-inverse,

> MatrixInverse(M).y;

[0.00520590831446895]
[ ]
[ 1.00159598094724]
[ ]
[ 0.999421478032667]

@herclau You don't have to explain to me the use of SVD for Least Squares solving.

Look:

> y.U.Ss.V;

[0.00520590831446885, 1.00159598094724, 0.999421478032667]

> V^%T.Ss.U^%T.y;

[0.00520590831446883]
[ ]
[ 1.00159598094724]
[ ]
[ 0.999421478032667]

Those two above are just transpositions of each other. Earlier, I suggested you use y.U.Ss.V. But of course you can use V^%T.Ss.U^%T.y as well, because they are just transpositions of each other! I simply chose the version which required no additional cost of actually doing transpositon.

You constructed Ss as the inverse of DiagonalMatrix(S[1 .. 3]) so of course Ss is symmetric. And so when multiplying y by V^%T.Ss.U^%T you are (least squares) solving M.x=y.

In your P2 you actually used V.U^%T.y.Ss which is no good. I don't know where you get that formula from.

Remember, SingularValues returns Vt which is the transpose of V. Forgetting that is not enough to generate your strange formula used in your P2, I think.

For fun (but perhaps not optimally using the 'thin' option, I didn't check), since Maple uses SVD to compute the non-square float pseudo-inverse,

> MatrixInverse(M).y;

[0.00520590831446895]
[ ]
[ 1.00159598094724]
[ ]
[ 0.999421478032667]
First 447 448 449 450 451 452 453 Last Page 449 of 592