mmcdara

7384 Reputation

20 Badges

8 years, 294 days

MaplePrimes Activity


These are replies submitted by mmcdara

@maple2015 

I have not read your last post yet, I will do that tomorrow.
For the moment I have a suggestion that dramatically  improve the quality of your model: fit log(Z), not Z.

You will find some premiminary results in the attached file 

  • Use of Z
    • The estimation of the representation error for all the models from [1, x] to [1, x, y, x^2, y^2, x*y]
    • The estimation of their generalization error GE with the Leave-One-Out method
  • Use of log(Z)
    • fit of the model [1, x, y, x^2, y^2, x*y]
    • plot of the exponential of this fitted model and comparison with the original data


LinearFit2.mw

Carl Love 
suggests to look to the p-values for the many terms of the model: this is obviously the first thing to do.
These p-values tell you what coefficient of the model is not significantly different from 0 ... but take this information very carefully: some people use the p-values to drop out some "non significant" terms fom a "big" model, obtaining then a more parcimonuous model. The problem is that this concise "small" model may be worse than some other model in betwwen the original "bg" one and this "small" one.




BTW: Student:-Statistics:-ChiSquareGoodnessOfFitTest is of no help in your case.
Goodness of Fit tests (GoF tests) are used  to compare empirical statistical distributions, for instance the empirical distributions of two samples possibly drawn (null hypothesis) from a same population.
You are not in this situation
If you want to assess the quality of your model, the best to do is to analyse the residuals (output=residuals in LinearFit): plot these values against the values of Z (or X or Y)  to detect some lack of representation of the model. If the cloud of points shows no particular structure your model can be reasonably correct. If the cloud contains some pattern, then the model is probably biaised or too poor to represent correctly the data.
Here is a simple reference showing a few classical patterns https://stattrek.com/regression/residual-analysis.aspx

Another simple trick is to observe the Cook's distances or the leverages (also returned by LinearFit)
https://en.wikipedia.org/wiki/Cook%27s_distance



Last thing to remind of: Goodness of Fit tests  (Chi square for example) require some conditions to give reliable answers.
In particular their answer is very sensitive to the sample size and, personnaly, I would never use a Chi square GoF test the way you did.



 

@maple2015 

 

BTW: I gave a look to your data (a few plots Z vs X and Z vs Y) and it doesn't seem that Z values are entailed by some measurement error.
I don't know where your data come from, but if Z is, for instance, the result of some unknown function F of X and Y 
(this is a classical situation which appears when you want to buil a simplified model of a more complex and time consuming one)
then a fruitfull approach is to use a kriging model.
This feature is now offered by Maple 2018. Unfortinately I'm now at home with my personal Maple 2015 license and I can tel you much. If your problem is relevant of a kriging modeling and if you have Maple 2018 at your disposal, it could be interesting for you to test it.
If you can't and are neverhtless interested by the kriging approach, please let me know and I will be able to do some job this monday once at my office.

 

@maple2015 

One more time: what do you mean by "the best statistical model".

Mac Dude here gave you some hints but, in essence, he didn't say anything else ...

Let's do the things simple: if you have N observations (17 in your case), then there exist an infinity of models with 17 parametrers that fit perfectly the data. In some sense theu are all "best statistical models".
One says that the "representation error" of all these models is 0.

Suppose you take one of them M :(X,Y) --> Z.
For all n in {1..N} you have M(X[n], Y[n]) =Z[n] ... but what is the prediction error of M at some other point (X', Y') not in 
{(X[n], Y[n]), n=1..N} ???
This last error is also termed "generalization error" ans describes the capability of the model M to predict correctly out of the sampling points (at least in some neighborhoud of heir convex hull).
The most common practice is to accept as "a best model" a model M that realizes some balance between the representation error (RE) and the generalization error. Models with high number of parameters generally give small values of RE and high values of GE: they are called over-fitted models or over-learned models and systematically given up.

One generally prefer good prediction capabilities (low GE) instead of a good representation of the data (which means nothing when they are uncertain values, for instance measurement values).

The simplest way (I do not say it is the best) to assess GE is to use the Leave-One-Out strategy.

  • Let S your original sample and S(-n) this sample where the n th observation X[n], Y[n]) has been dropped out.
  • Build the model M(-n) over S(-n) (just take Statistical:-Fit has you did)
  • Compute M(X[n], Y[n]): a quick estimation of GE is GE(-n) = M(X[n], Y[n]) - Z[n]

Repeating this sequence N times will give point-wise estimators GE(-1), ..., GE(-N)
The variance of these N estimators is an estimator of the generalization (prediction) error of the model M fitted over all the sample S.

More sophisticated approaches are Leave-K-Out, Bootstrap (look to the statistical package for more details: let me know if you are interested in bootstraping a statistical model), Cross-Vallisation ... (the meaning of all these strategies can be easily found on Wikipedia).

 

Last but not least ...

Let's take a simpler model with only one regressor X and a single dependant variable Z.
When the experimental observation of Z is submitted to some measurement/observation error U, the statistical approach begins by writing some a priori model Z = M[P](X) + U where P is the set of parameters of M

(it's a little bit more subtle than that because the you should write that the "Conditional expectation of Z given X = x is equal tp M[P](x) and the conditional variance of  Z given X = x is equal to the variance of U at X = x)


Fitting the parameters P by mean of the least squares method, for instance, means that your are looking some specific value p* of P that has some particular property (here minimizing the sum of the residuals).
Then M[p*] is the model that minimizes the representation error RE over the class M[P] of models

But this doesn't makes M[p*] the best model in many other senses.
Consider this example 

  • The sample S is made of 2 points (X[1], Z[1]) and (X[2], Z[2])
  • M[P] : X --> a*X+b
  • Obviously the best "least squares" model verifies a*=(Z[2]-Z[1])/(X[2]-X[1]) and b*=(X[1]*Z[2]-X[2]*Z[1])/(X[2]-X[1])


For many people this model is the best for this couple (a*, b*) maximizes some likelihood function over S.
But this is true only when the likelihood function is a function of (M[a, b](X[1])-Z[1])^2+(M[a, b](X[2])-Z[2])^2; that is when U has a gaussian stationnary distribution.

Consider the counter example when U is Uniform on the interval [-h, +h]

Then all the couples (a, b) such that the graph (straight line) of M[a,b](x) goes through the two "boxes" 
(x=X[n], z in [Z[n]-h, Z[n]+h]) have exactly the same value of the likelihood function.
For instance, from the "maximization of the likelihood point of view", the models 
a=((Z[2]+h)-(Z[1]-h))/(X[2]-X[1]) and b*=(X[1]*(Z[2]+h)-X[2]*(Z[1]-h))/(X[2]-X[1])
a=((Z[2]-h)-(Z[1]-h))/(X[2]-X[1]) and b*=(X[1]*(Z[2]-h)-X[2]*(Z[1]-h))/(X[2]-X[1])
a=((Z[2]+h)-(Z[1]+h))/(X[2]-X[1]) and b*=(X[1]*(Z[2]+h)-X[2]*(Z[1]+h))/(X[2]-X[1])
....

are as best as the model  (a*, b*) is

 

What do you mean by saying "the best statistical model" ?

 

@stefanv 

Wrong.
If you work in worksheet mode, not in document mode, the writeto function from Maple 2015 & 2016 redirects only the output, not the input.
In this without invoking interface(echo=0).

With Maple 2018 (run on a Windows 7 PC ... I'll give you tomorrow the exact version of Maple 2018 I use at office, for I'm now at home and I'm going to sleep), even if I put interface(echo=0) in a separate block just after the restart, I recover the input in the file.

 

@acer 

 

I've forgotten to say I'm working with the preferences described in the attached file

Capture_d’écran_2018-06-27_à_22.16.27.pdf

(sorry for giving you this environment in french but I'm here at home and french is the default language for my personal license).
Furthermore, I always work in worksheet mode, never in document mode

 

@Preben Alsholm 

Thanks for all these advices 
They seem very interesting at the first reading, but I need to test them by myself to re-appropriate the full material.

You said "I succeeded in some attempts (after restarting) and other times it took too long for my patience" ... please don't worry about that, you have already done more that i ever expected !

Looking forward to read you soon on other topics.
Have a nice weekend

@Preben Alsholm 

Before sending my answer ro  mehdibaghaee I had tried to obtain an explicit solution of the this ODE system:
     dsolve(...);
     allvalues(%);


The result allvalues returned was extremely lengthy, but a quick look revealed some facts:

  1. from the result of  dsolve(...) we can observe that the term  RootOf(_Z^4+9*_Z^3+4*_Z^2-19*_Z-5, index = SomeInteger)  is recurrent. 
    Unfortunately allvalues(RootOf(_Z^4+9*_Z^3+4*_Z^2-19*_Z-5) ) is dramatically untractable because, as a rule, Maple always represents (except in obvious cases) the formal solution of equations ot the 3rd or 4th degree that Maple rin a complex form, even if the roots are real.
    Does it exist some "already implemented feature" to force Maple to return roots under a real form if they are indeed real ?
    One might think of some analysis of the discriminant.
    Or, for 3rd degree equations, one might use the  "trigonometric resolution" which gives expressions in temrs of hyperbolic trig. functions when only one root is real, or in terms of cosine if the 3 roots are real.
     
  2. The second question has probably already been asked here.
    Because the explicit solution for, let's say,  phi__1(t) contains many times the same term, for instance 
    (4*(3268+(3*I)*sqrt(10275765))^(2/3)+211*(3268+(3*I)*sqrt(10275765))^(1/3)+1876)/(3268+(3*I)*sqrt(10275765))^(1/3)
    or 
    (3268+(3*I)*sqrt(10275765))
    I have had the idea, in order to have a more readable solution, to susbtitute some repetitive pattern by, let's say, a few auxiliary variables U, V, W, ...
    What I tried was
           dsolve(...):
           sol := allvalues(%):
           subs(SomePattern=U, rhs(sol[1]))

    Unfortunately this subsitution did not always apply, leaving unchanged some terms which nevertheless contained the pattern "SomePattern".
    For instance, if SomePattern := (3268+(3*I)*sqrt(10275765))^(1/3): changes occur in expressions invoking SomePattern but not 1/SomePattern (wich comes from the fact that 1/SomePattern is in fact  (3268+(3*I)*sqrt(10275765))^(-1/3) , or even in exp(SomePattern) ...

    Do you that functions such as select/remove, patmatch ... could help to do the substitution everywhere SomePattern appears ?


I would be pleased to have your opinion about these two "by-problems".
Thanks in advance


 

@tomleslie 

Sh... It's not the first time that I get taken this way !
Sorry for asking this question for I knew perfectly that the restart had to be put in a separate block.

Thanks for reminded me how volatile my memory is :-)

If you have some hints about x you can do this 

@panke 

in dsolve[numeric], the bouncing ball example uses 2 events
[[y(t),diff(y(t),t)>0],halt] and [[y(t),diff(y(t),t)<0],halt]

for events is a list of events, the syntax for the bouncing ball problem is  
events=[ [ [y(t),diff(y(t),t)>0], halt],  [ [y(t),diff(y(t),t)<0], halt] ]

Both events here is a list of two objects : the trigger and the action.
A trigger MUST be a list only if it is made of more than one condition.
If the trigger is "simple", ypu MUST NOT put it between square brackets.

In the help page (topic "event specification") look to the difference between 

y(t): simple root finding trigger       and       [f(t,y(t)),c(t,y(t))<0]: root finding with a conditional trigger



For your triger s(tau)-const is "simple", it must not be enclosed between square brackets
The event is then [s(tau)-const, halt], and you must write   events = [ [s(tau)-const, halt] ]

 

@panke 

First of all :  I am used to dsolve a set of equations, not a list of them (I don't know if Maple handles this correctly, but it seems weird for me to speak of "list of equations"). So I have taken the liberty to replace  [ic, odes, odex], by {ic, odes, odex},

Second point :  ta..te is an invalid range for neither ta nor te are numbers.
2 ways to fix that :1/ give them numerical values (0..1 below) or, 2/ use the option "parameters" (have a look to dsolve[interactive].

Last point : the syntax for the event is not correct (ok, there a lot of recurrent complaints about the corresponding help page) 

Finally, try this with your own range
dsolve({ic, odes, odex}, type = numeric, range = 0 .. 1, events = [[s(tau)-1, halt]])

@acer @Carl Love 

Great thanks to both of you for this detailed discussion.
To summarize and clarify a few questions :

  • acer/  "but I interpreted the original question as being more about how to write the procedure so that the calls to H could be written with just the short form of the name":
    Absolutely: I wrote "f" first, observed it didn't worked as expected, then tried "g", became happy it worked correctly but remained upset for the syntax orthopoly:-H(...) is obviously heavier than just "uses orthopoly;..... H(...)"
     
  • acer/  thanks for the solutions you provided in your first answer
     
  • Carl Love/  about the "semantics of :-" ... I didn't know about this way to access an entry of a table. If I correctly understand what you said, it is some kind of undocummented feature ?
    It has one funny consequence: for instance (attached file), the syntax T:-N (where T is a table and N one if its indices) avoids using eval to visualize T[N] if N is itself a table.

    HowToAvoidUsingEval.mw
     
  • Carl Love/  Thanks for having explained "why procedure g works" 
     
  • both of you/  I understand I should use the "correct coding" acer provided while avoiding the "T:-N" syntax pointed by Carl 

 

Thanks for your help and your

@acer 

Thanks for the advice.
I'll do the test since I get back from my holidays.
I'll see you again around 9-10 June

 

@Carl Love 

I copy-and-pasted the last(?) code.
I get an error.
I think it's because I'm at home with Maple 2015.2 (no Iterator package ; Maple 2016 and Maple 2018 at the office) a

I'm in vacation and I won't be able to do some tests of performances within one week.
I'll see you around 10 June.

IsThisTheGoodCode.mw

First 136 137 138 139 140 141 142 Last Page 138 of 148