Carl Love

Carl Love

28050 Reputation

25 Badges

12 years, 336 days
Himself
Wayland, Massachusetts, United States
My name was formerly Carl Devore.

MaplePrimes Activity


These are replies submitted by Carl Love

@roman_pearce 

Intel Core i7-3620QM @ 2.4 GHZ running Windows 8.1 64-bit. Intel's website claims that it has four "cores" and eight "threads".

Please let me know if you can achieve substantially different results on any processor.

@Christopher2222 

Thank you for your comment and your interest in the post.

You wrote:

based on performance gains it's hardly worth devoting more than 4 of 5 cpu's to a task, unless for some reason a deadline looms and calculations are required to be completed before a specific time.

Yes, that was my conclusion exactly. I wonder to what extent this is a hardware issue. Perhaps hyperthreading two threads per core is not effective, which is why my "sweet spot" is at four threads---my number of cores.

I wonder if two dual core processors (two physical processors with two independent cores each) would outperform a quad core processor (single processor with four independent cores).

Yes, I'd like to know that. Hopefully my worksheet can be run on as many different processor arrangements as possible. And let's not ignore the possible role that the operating system may have. Please, would some Linux or Mac user run my worksheet?

@sand15 This discussion prompted me to make precise timing measurements using different numbers of cores and analyze the results. It took me several days because there was a big hurdle trying to figure out how to adjust the number of cores when using the Threads package. This was covered in my Question "How to limit the number of processors used by Threads?" The result was that the test needed to be run in a different session for each number of cores, which complicated the programming. But, it's done, and the results are now posted as "Diminishing Returns from Parallel Processing: Is it worth using more than four processors with Threads?" My conclusion was no, but perhaps this is a hardware issue. I need to see results from a machine capable of 16 or more threads.

@Milos Bilik You may be thinking of a continuous Markov chain as being "really hard to compute." This is a discrete Markov chain---I was taught how to do the computations by hand in high school.

I'm not totally convinced that this model is the best for your situation. But, it's trivial to compute, so at the very least it's a good start.

By reading the data with Maple, it would be trivial to compute, empirically, the following conditional probabilities: Let P[i,j] be the probability that j consultations are performed the next business day given that i consultations are performed today. Then P is the transition matrix of a Markov chain.

@acer Here is a simpler test:

nops({Threads:-Seq(Threads:-Self(), k= 1..2^19)});

with the expected result being a positive integer less than or equal to kernelopts(numcpus).

I can confirm what you said: that if the kernelopts(numcpus= ...is issued immediately after restart, then it takes effect.

So, no reader has an idea how to limit the number of processors or what kernelopts(numcpus= ...is supposed to do? I guess that I'll call it a bug and submit an SCR. There's an unusual thing about this bug though. Usually a bug occurs for some inputs to a command or under some circumstances. In this case, it seems like kernelopts(numcpus= ...) doesn't work for any input under any circumstances (on my computer). Could someone running Linux or MacIntosh confirm the results of the worksheet in my Question above?

@sand15 In French, what do you call the relationship between cos(x) and sec(x)? In English, they're called reciprocal functions.

@acer Ah, if I'd known that RGB24ToName had an option palette, I wouldn't have written my RGBtoName. That's an example of what I meant by "The APIs are inconsistent."

@adel-00 Now that the implicitplot is done, we know ranges for the two positive roots. One is between 0 and 1, and the other is between 8 and 10. Once that is known, it's easier to use fsolve to get the roots.

r1:= (So::And(numeric, satisfies(x-> x >= 4 and x <= 5)))->
     fsolve(eval(Q, so= So), s= 0..1)
:
r2:= (So::And(numeric, satisfies(x-> x >= 4 and x <= 5)))->
     fsolve(eval(Q, so= So), s= 8..10)
:
xs:= (u,so)-> (so-u)*d*(a1+u)/(m1*u):
zs:= xs-> ((m2-d2)*xs-a2*d2)*(a3+ys)/(m3*(a2+xs)):
Xs1:= so-> xs(r1(so),so):
Xs2:= so-> xs(r2(so),so):
plot([Xs||(1..2)], 4..5);
Zs1:= so-> zs(Xs1(so)):
Zs2:= so-> zs(Xs2(so)):
plot([Zs||(1..2)], 4..5);

@Østerbro The doubling of the final numeric output will stop if you end your commands with a colon (:).

Your teacher's demands regarding units and regarding numbers are inconsistent: The teacher wants the units automatically simplified but doesn't want the numeric computation automatically simplified.

@tomleslie wrote:

[I]f I use ctrl-alt-delete->performance, then I get 8 "cpu" panes. I have always interpreted this as my processor is capable of running 8 threads.

Yes, that is correct.

Important to note that that an application may not ask for multiple threads, but Intel/Windows will decide that multiple threads are possible, and run them anyway.

I find that hard to believe because software that can decide when multiple threads are possible in other software is kinda a holy grail of multi-programming. The example that you show below is not an example of this for reasons that I will explain in a moment. So, do you have another example?

 

As an example, if I execute

 

restart:
L:= RandomTools:-Generate(list(integer, 2^18)):
CodeTools:-Usage(mul(x, x= L), iterations= 4):

 

...according to the ctrl-alt-delete/performance monitor, four of my "cpus" start working really hard....My assumption has always been that this is what happens when win7/intel attempts to multithread for efficiency purposes.

 

The effect that you are seeing is due to Maple's multithreaded garbage collection, which by default uses four threads (this is adjustable). See ?updates,Maple17,Performance -> Parallel Garbage Collector. Note that this multithreaded garbage collection is in effect (by default) all the time, regardless of whether you are using a multi-processing package.

On the other hand, if I execute your code group

...
L:= RandomTools:-Generate(list(integer, 2^18)):
CodeTools:-Usage(Threads:-Mul(x, x= L),iterations= 4):

.... This is way faster than the previous version - as in 36.59x in real time. I have always put this down to the fact that Maple is *way* better at determining an optimum mulithreading strategy than win7/intel is.

That effect can't possibly be due to multithreading. If you're using n threads, then the theoretical maximum real-time performance increase would be a factor of n (8 in this case). The effect that you're seeing is due to mul using a very poor algorithm for multiplying long lists of rational numbers. I assume that it's using a straightforward "linear" loop, something akin to

proc(L::list) local x, p:= 1;  for x in L do p:= p*x end do end proc:

but written in C. I can write a better divide-and-conquer multiplication algorithm using a paint brush clenched in my butt cheeks---and here it is:

Mul:= proc(L::list)
local n:= nops(L), m;
     if n < 4 then `*`(L[])
     else
          m:= iquo(n,2);
          thisproc(L[..m]) * thisproc(L[m+1..])
     end if
end proc:

Comparison:

L:= RandomTools:-Generate(list(integer, 2^16)):
p1:= CodeTools:-Usage(mul(L), iterations= 4):
memory used=9.36GiB, alloc change=0 bytes, cpu time=6.18s, real time=5.74s, gc time=2.29s

p2:= CodeTools:-Usage(Mul(L), iterations= 4):
memory used=32.32MiB, alloc change=0 bytes, cpu time=398.50ms, real time=406.00ms, gc time=0ns

p1-p2;

     0

That's a performance-improvement factor of 6.18/.3985 = 15.5. (Also, my code is 9.36/.03232 = 290 times more efficient at memory utilization.) The non-multi-threaded part of the algorithm used by Threads:-Mul is closer to mine than it is to the default mul. The rest of your factor-of-36 performance improvement is due to the multi-threading. To be fair to Maple's mul, it wasn't designed specifically for long lists of rationals, and it does just fine with lists of floats.

If I execute kernelopts(numcpus), it returns 8. Trust me, I only have 1cpu on this machine and the relevant help page states

 "this will be the actual number of CPUs that the machines has (treating hyperthreaded CPUs as 1 CPU). " 

so I would expect the answer to be 1, because I only have  one cpu (which admittedly can support 8 threads).

That statement (from ?kernelopts -> numcpus) is either flat-out wrong or it's poorly worded. The term CPU is being used ambiguously. The default value of kernelopts(numcpus) is the number of threads your machine can support. The name of the option should be changed to numthreads.

I don't really understand the purpose of your final code group....

By changing numcpus, I hope to change the number of threads used, and thus change the times.

[N]othing in it changes the number of cpus/threads....

Do you see the code kernelopts(numcpus= n)? That's supposed to change the number of threads. The print statement confirms that numcpus is changing.

I don't understand why each iteration executes about twice as fast the single calculation above. I can only assume that the loop construct changes the way that the calculation is "threaded" and produces something more efficient - no idea why.

No, it's because the first run is repeatedly allocating memory from the O/S; the subsequent runs are using the memory that has already been allocated for the first run. That's the purpose of my comment "First warm-up and stretch the memory. Otherwise the timings are invalid." See ?updates,Maple16,memorymanagement and ?updates,Maple17,Performance -> Multiple Memory Regions.

@firmaulana A linear program (LP) (as opposed to an integer linear program (ILP)) with 1340 constraints and roughly the same number of variables isn't exceptionally large. I'd guess that a dedicated LP package such as LINDO could handle it. You should probably use Maple to put the problem into a matrix form that you could pass to LINDO. There's a LINDO plug-in for Excel. Certainly passing matrices between Maple and Excel is easy, although there's probably some limitation on the width of an Excel worksheet.

@nm You're making several mistakes:

1. Semicolons are not part of statements.

2. Nor do semicolons terminate, complete, or finish statements; rather, just like in English, semicolons separate statements. In this way Maple syntax differs from the corrupted, impure syntax of the C family of languages. Rather, Maple syntax is derived from the beautiful syntax of the Algol family of languages.

3. A procedure definition is not a statement; it's an expression, a data structure just like lists, sets, etc. Only if it's assigned to something does it become a statement. (An isolated expression can be considered a form of statement. This is only useful if a procedure has side effects.)

4. A pair of parentheses in isolation, (), is equivalent to the NULL expression sequence; it's a valid expression sequence like any other.

5. P:= proc() whatever end proc() is not equivalent to P(), nor did I say it was. The things that are equivalent are

P:= proc() whatever end proc;  E:= P();  P:= 'P';

and

E:= proc() whatever end proc();  or  E:= (()-> whatever)();

These latter forms avoid the wasted syntax and memory of giving a name to a procedure which will never be used again. All these forms produce the same result, E.

It follows from (2) that a semicolon is never required immediately before end, else, elif, fi, od, or catch. It follows from (4) that although such semicolons aren't required, they are allowed. It follows from (3) that an anonymous procedure definition can be passed as an argument; indeed, such passing is very commonly seen as the first argument of map, select, etc.

Everything that I've said above is about the allowances and flexibility of the syntax; there's nothing about its restrictions. This is the antithesis of "fussy". If you want more things to generate syntax errors, then it's you who wants a fussy language.

@taro In Maple, you never need to copy the output and modify it manually. The following procedure will take any function expression and distribute the functional operator over the first argument if the first argument is a sum (but not a sum or Sum):

Distribute:= (f::function)-> maptype(:-`+`, op(0,f), op(f)):
L:= Limit(f(a+h)-f(a), h= 0);
Distribute(L);

First 401 402 403 404 405 406 407 Last Page 403 of 709