sand15

1556 Reputation

15 Badges

11 years, 100 days

MaplePrimes Activity


These are Posts that have been published by sand15


Although the worksheets provided here have been developed under Maple 2015, they should work correctly with newer versions, except perhaps for commands that use the 'op' function ('piecewise' mainly).

In the sequel the acronym 'pdf' stands for 'probability density function'.



CONTEXT

This post originates from a recent question by @JoyDivisionMan and the ensuing discussion. 

In a few words, the OP noticed Maple 2025 failed to return a result and asked why. In his reply, @acer identified a code regression somewhere in between Maple 2023 and Maple 2025.
Like Maple 2023 "my"  Maple 2015 does not fail but provide... a wrong answer: Do versions in between 2015 and 2024 return wrong results too?

 

In this post I explain how we can calculate the result by hand (only elementary maths required), why Maple 2015 (and likely newer versions) returns an incorrect result, why Maple generally fails in returning a result, and finally provide several examples to illustrate that even mathematically simple.
The various test cases I present are all equally simple for a skilled human agent, but conversely all beyond the reach of Statistics:-PDF.
This raises the question: Can a robust algorithm for calculating this PDF be developed without resorting to an expert system (I don't like the term AI) that mimics human reasoning?



THE MAIN OBSERVATION

Let X some continuous univariate random variable (CURV) and 𝟇 a real valued function from defined over the support of  . Let Y the random variable defined by Y =  𝟇(X).

The main observation about Statistics:-PDF is that unless very specific situations, this procedure does not build the correct expression of pdf
(Y) as soon as 𝟇 is not a strictly monotone function

Here are a fex exceptions to this claim:

X any CURV𝟇 : xxn  , n positive integer (correct solution even if 𝟇 is not monotone)

X ~ Uniform(-1, 1)𝟇 : xarctanh(x)  (no result returned even if 𝟇 is strictly monotone).


It is worth saying that, only by chance, Statistics:-PDF may sometimes return the correct result even if 𝟇 is a non monotone function.
For instance, in
@JoyDivisionMan's question, the procedure was indeed capable to provide a correct result for the case 
X ~ Uniform(0, 2𝜋)𝟇 : x ⟼ arctanh(x) but this was only because two errors balanced each other out. Replace X ~ Uniform(0, 2𝜋)  by X ~ Uniform(a, a+2𝜋) and Maple is wrong (notice that the pdf of 𝟇(X) remains unchanged whatever the value of a).


A GOOD DRAWING WORTH A THOUSAND WORDS

Here is a picture to help understand how to get the pdf of Y =  𝟇(X) for a non monotone function 𝟇 (the case of a monotone function directly comes from this latter).

In this illustration 
X ~ Uniform(0, 2𝜋)𝟇 : xsine(x).
To ease the explanation, I write 
X as a mixture of three uniform random variables X1, X2, X3, whose supports are the intervals of the three branches of 𝟇. More formally, X  = (1/4)∙X1 + (1/2)∙X2 + (1/4)∙X3.
The restrictions  of
𝟇 to these three branches are denoted 𝟇1, 𝟇2, 𝟇3.



The large rectangles below the horizontal axis represent the pdf of X1, X2, X3  and the blue curve the 𝟇 function.
The image of the 
interval [y-dy, y+dy]  by the inverse functions 𝟇1(-1) and 𝟇2(-1) of 𝟇1 and 𝟇2  are represented by the vertical rectangles [x1-dy, x1+dy] and [x2-dy, x2+dy] .
These two intervals bring a contribution to the pdf of 
(light gray blue on the right) represented by the horizontal violet rectangle on the right side of the picture.

T
he probability Prob(Y  [y-dy, y+dy]) that Y belongs to the interval [y-dy, y+dy]) is simply the sum 
                 Prob(
Y  [y-dy, y+dy])  = Prob(X1  𝟇1(-1)([y-dy, y+dy]))  + Prob(X2  𝟇2(-1)([y-dy, y+dy]))

Let
𝟇'b(-1)(y) denote the derivative of 𝟇b(-1)(y).
Making 
dy tends to 0 gives
                pdf(
Y y)  =  pdf(X1 𝟇1(-1)(y)) ×  𝟇'1(-1)(y) |pdf(X2 𝟇2(-1)(y)) ×  𝟇'2(-1)(y) |

As I said before there is truly no big math behind this, Except maybe those absolute values?
To understand where they come from zoom in on the rectangle 𝜔 = [
x1-dx, x1+dx] ╳ [y-dy, y+dy] and denote X𝜔 and Y𝜔 the restrictions of X1 and Y to 𝜔.
Locally 
Y𝜔 is proportional to A+B∙X𝜔 where constant B = 𝟇'(x1) and the value of constant does not matter here.
So the pdf of 
Y𝜔 is (a classical result)  pdf(Y𝜔 y) = pdf(X𝜔 (y-A)/C) / |C|.
Few details can be found Here.


MAPLE FAILURES AND WEAKNESSES

So why did Maple, at last some versions, produce a wrong result and why some versions are not even capable to return one?
The reason is that there is no big math only at first sight...because determining the inverse function of 
𝟇 can be quite tricky as soon as 𝟇 is not one-to-one map, for instance when 𝟇 is not strictly monotone.
When it is so
 𝟇(-1) must be defined for all the branches whose definition intervals intersect the support of X.

I spent a lot of time debugging the procedure Statistics:-PDF to understand why it either fails or produces incorrec results.
The 
sine_debug_nodebugoutput.mw  worksheet presents the "X ~ Uniform(0, 2𝜋)𝟇 : xsine(x)" case (as Mapleprimes stubbornly refuses to upload the worksheet containing the debugger trace, I convert it to sine_debug.pdf to help you see this trace). 
To orient the core development team correcting this procedure (assumming they care), the critical procedures are

Statistics:-RandomVariables:-PDF:-Univariate:-GetValueTab[anything]
and  Statistics:-RandomVariables:-GetInverse
 

At the very end it is this second procedure which is truly responsible of Maple failing to provide a result or returning an incorrect on, because it does not correctly build the inverse functions 𝟇b(-1)  for all the branches b which matter.
 

I wrote above that "it is only by chance that Maple provided the correct result for the non monotone function 𝟇 = cosine". Indeed Statistics:-PDF returns a wrong result when X ~ Uniform(z, z+2𝜋) and z is not a multiple of 𝜋/2 (see cosine.mw).

Other important situations where Maple fails returning a result are those where
𝟇 is a polynomial function with different zeros located in the support of X.
I did not trace them but it seems that
Statistics:-PDF does not know how to build the 𝟇b(-1) in this case (even though it is quite simple, see "Polynomial" examples below).


A SELECTION OF EXAMPLES

Here is a selection of examples to demonstrate that even in rather complex cases the pdfs of 𝟇(X) can be constructed quite easily (note that Maple either fails to compute them or to provide a correct result):



OPEN QUESTION

Tracing Statistics:-PDF reveals an already complex algorithm designed to handle a broad variety of "canonical" situations. Even this the algorithm fails in almost all non-toy-problem such as this compilation proves Maple_failures.mw.
At first sight, there seems to be a contradiction between the (apparent?) simplicity with which one can obtain, by hand in sometimes in an ad hoc way, the expression of pdf(𝟇(
X)), whether it is exact or truncated, and the complexity of the Statistics:-PDF algorithm, which results in failure in all non trivial cases.

This observation leads to the important question "Is it possible to rewrite Statistics:-PDF in order to enlarge its domain of success?".
I have the feeling that this means designing an algorithm which focuses more on mimicing the human reasoning than identifying "canonical" situations (as it is done today). 
An AI-driven algorithm maybe?

I repeat here that the maths are very simple, and all the more simple if you represent the random variable 
X as a mixture of components X1, ... XBboth having the same (truncated) distribution than X, and whose supports identify to the intervals of definition of the B branches of 𝟇 over the whole support of X.
The only difficulty lies in the identification of these branches and in the construction of the functions 
𝟇b(-1) over the supports of each Xb.

I have no answer to this question.

This post stems from this Question to which the author has never taken the time to give any answer whatsoever.

To help the reader understand what this is all about, I reproduce an abriged version of this question

I have the following data ... [and I want to]  create a cumulative histogram with corresponding polygon employing this same information...

The data the author refers to is a collection of decimal numbers.

The term "histogram" has a very well meaning in Statistics, without entering into technical details, let us say an histogram is an estimator of a Probability Density Function (continuous random variable) or of a mass function (discrete random variable), see for instance Freedman & Diaconis.

The expression "cumulative histogram" is more recent, see for instance Wiki for a quick explanation. Shortly a cumulative histogram can be seen as an approximation of the Cumulative Density Function (CDF) of the random variable whose the sample at hand is drawn from.

In fact there exists an alternative concept named ECDF (Empirical Cumulative Distribution Function) which has been around for a long time and which is already an estimator of the CDF.
Personally I am always surprised, given the many parameters it depends upon (anchors, number of bins, binwidth selection method, ...), when someone wants to draw a cumulative histogram: Why not draw instead the ECDF, a more objective estimator, even simpler to build than the cumulative histogram, and which does not use any parameter (that people often tune to get a pretty image instead of having a reliable estimator)? 

Anyway, I have done a little bit of work arround the OP's question, and it ended in a procedure named Hodgepodge (surely not a very explicit name but I was lacking inspiration) which enables plotting (if asked) several informations in addition to the required cumulative histogram:

  • The histogram of the raw data for the same list of bin bounds.
  • The kernel density estimator of this raw-data-histogram.
  • The ECDF of the data.

Here is an example of data

and here is what procedure Hodgepodge.mw  can display when all the graphics are requested

This post presents a work I did to answer this recent question by @AHSAN.

I believe that the two procedures I developed could be of interest to Mapleprimes members. Here they are in their most recent forms. However, I think they can still be improved and completed, and I am open to your suggestions for developing them further in this direction.

BarGraph_2_Categories.mw
Illustration


BarGraph_3_Categories.mw
Illustration

 

I must thank @Scot Gould for having asked this question more than a year ago and thus, without meaning to, having been the driving force behind this post.

There is an enormous literature about Monte-Carlo integration (MCI for short) and you might legitimately ask "Why another one?".

A personal experience.
Maybe if I tell you about my experience you will better understand why I believe that something is missing in the traditional courses and textbooks, even the most renowned ones.

For several years, I led training seminars in statistics for engineers working in the field of numerical simulation.

At some point I always came to speak about MCI and (as anyone does today) I introduced the subject by presenting the estimation of the area of a disk by randomly picking points in its circumscribed square and assessing its area from the proportion of points it contained.



Once done I switched (still as anybody does) to the Monte-Carlo summation formula (see Wikipedia for instance).

One day an attendee asked me this question "Why do you say that this [1D] summation formula is the same thing that the [2D] counting of points in the [circle within a box] example you have just presented?"

I have to say I was surprised by this question for it seemed to me quite evident that these two ways of assessing the area were nothing but two different points of view of, roughly, the same thing.

So I gave a quick, mostly informal, explanation (that I am not proud of) and, because the clock was running, I kept teaching the class.

But this question really puzzled me and I thought for a simple but rigourous way to prove these two approaches were (were they?) equivalent, at least in some reasonable sense.

The thing is that trying to derive simple explanations based on couting is not enough, and that you have to resort to certain probabilistic arguments to get out of it. Indeed, sticking to the counting approach leads to the more reasonable position that these two approaches are not equivalent.

The end of the story is that I spent more time on these two approaches of MCI during the trainings that followed.

Saying that, yes, the summation formula seems to be the reference today, but that the old counting strategy still has some advantages and can even gives access to information that the summation formula cannot.

About this post.
This post focuses mainly on what I call the Historical viewpoint (counting points), and is aimed, in its first part, to answer the question "Is this point of view equivalent or not to the Modern (summation formula) one?" (And if it is, in what sense is it so?).

Let me illustrate this with the example @Scot Gould  presented in its question. The brown bold curve on the left figure is the graph of the function  func(x) (whose expression has no interest here) and the brown area represents the area we want to assess using MCI.

In the Historical approach I picked unifomly at random N=100 points within the gray box (of area 2.42), found 26 of them were in the brown region and said the area of this latter is 2.42 x 26/100 = 0.6292. The Modern approach consists in picking uniformly N random points in the range x= [0.8, 3],  and using the blue formula to get an estimation of this same area ((Lbox is the x-length of the gray box, here equal to 2.2).

The quesion is: Am I assessing the same thing when I apply either method? And, perhaps more importantly, do my estimators have the same properties?


And here apppears a first problem:

  • Whatever the number of times you repeat the Historical sampling method, even with different points, you will always get a number of points in the brown region between 0 and N included, meaning that if S is the area of the gray box, the estimation of the brown area is always one of these numbers {0, S/N, 2.S/N, ..., S}.
  • At the opposite repetitions of the Modern approach will lead to a continuum of values for this brown area.
  • So, saying the two approaches might be equivalent simply means that a discrete set is equivalent to a non countable one.

If we remain at the elementary counting level, Historical and Modern viewpoints then are not equivalent.

Towards a probabilistic model of the Historical Process:
This goes against everything you may have heard or read: so, are the authors of these statements all wrong?

Yes, from a strict Historical point of view, but happily not if we interpret the Historical approach in a more loose and probabilistic manner (although this still needs to be considered carefully as it is shown in the main worksheet).

This probabilistic manner relies upon a probabilistic model of the Historical process, where the event "K points out of N belong to the brown area" is to be interpreted as the realization of a very special random variable named Poisson-Binomial (do not worry if you never heard about it: a lot of statisticians did not neither).

In a few words, whereas a Binomial random variable is the sum of several independent and identically distributed Bernoulli random variables, a Poisson-Binomial random variable is the sum of several independent but not necessarily identically distributed Bernoulli random variables. Thus the Poisson-Binomial distribution generalizes the Binomial one.

Using the properties of Poisson-Binomial random variables we must prove in a rigorous way that the expectations of the area estimators for both the Historical and Modern approaches are identical.

So, given this "trick" the two methods are thus equivalent, are they not? And that settles it.

In fact, no, the matter of equivalence still remains.

When uncertainty enters the picture.
Generally one cannot satisfy ourselves with the sole estimation of the area and we would like to have information about the reliability of this estimation. For instance if I find this value is 0.6292, am I ready to bet my salary that I am right? Of course not, unless I am insane, but the things would change if I were capable of saying for instance that "I am 95% sure that the true value of the area is between 0.6 and 0.67".

For the Historical vewpoint the Poisson-Binomial model makes possible to assess an uncertainty (not the uncertainty!) of the area estimation. But things are subtle, because there are different ways to compute an uncertainty:

  • At the elementary level the height of the gray box is an essential parameter, but it does not necessarily gives a good estimation of this uncertainty (one can easily reduced this latter arbitrarily close to 0!).
  • To get reliable uncertainty estimation the call to a probability theory related to Extreme Value Theory (EVT for short) necessary (all of this is explained in the attached worksheet).


For the Modern point of view it is enough to observe that there is no concept of "box height" and that it is then impossible to assess any uncertainty. Question: "If it is so, how can (all the) MCI procedures return an uncertainty value?"
The answer is simple: they consider a virtual encapsulating box whose eight is the maximum of the 
func(xi). This trick enables providing an uncertainty, but this is a non-conservative estimation (an over-optimistic one if you prefer, in other terms an estimation we must regard very carefully).

So, at the end Historical and Modern approaches are equivalent only if we restrict to the estimation of the area, but no longer as soon as we are interested in the quality of this estimation.

What does the attached file contain?
The attached file speaks a lot to the estimation of the estimator uncertainty.
The core theory is named (Right) EndPoint Theory (I found nothing on Wikipedia nor any easy-to-read papers about this theory, so I more or less arbitrarilly decided to refer to this one). Basically it enables assessing the (usually right) end-point of a distribution known only through (right) censored data.
The simplest example is those of a New York pedestrian who looks to the taxi numbers and asks himself how to assess the highest number a taxi has. Here we know this number exists (meaning that some related distribution is bounded), but the situation can be more complex if one does not ever know if this distribution is bounded or not (in which cas one seeks for a right end-point whose probability to be overpassed is less than some small value).
A conservative, and thus reliable, uncertainty on the area estimator  can only be derived in the framework of the end-point theory.

Once the basis of this theory are understood it becomes relatively simple to enhance the Historical approach to get estimators with lessen uncertainties.
I present different ways to do this: one (even if derived otherwise) is named Importance Sampling, and the other leads in a straightforward way to algorithms which are quite close to some used in the CUBA library (partially accessible through evalf/Int).

The last important, if not fundamental, concept discussed in this article concerns the distinction between dispersion interval and confidence interval, concepts that are unfortunately not properly distinguished due to the imprecision of the English language (I apologize to native English speakers for these somewhat harsh words, but this is the reality here).

Some references are provided in attached (main) worksheet, but please, if you don't want to end up even more confused than you were before, avoid Wikipedia.

To sum up.
This note is a non-orthodox presentation of MCI centered arround the Historical viewpoint which, I am convinced of that, deserves a little more attention than the disk-in-the-square picture commonly displayed in MCI courses and textbooks.
An I am even more convinced of that then this old-fashion (antiquated?) approach is an open door to some high level probability theories such than the EndPoint and the EVT one.

Of course this post is not an advocacy agaist the Modern approach, and does not mean that you have to ignore classical texts or that the Law of Large Numbers (LLN) or the Central limit theorms are useless stuff in MCI.

Maple, but not just Maple.
A part of the attached worksheet is devoted base presents results I got with R  (a programming language for statistical computing and data visualization), simply because Maple 2015 (and it is still true for Maple 2025) did not contain the functions I needed.

For instance R implements the Cuba library in a far more complete way than Maple (I give a critical discussion about the way Maple does it), enabling for instance the change of the random seed.

Main worksheet (I apologize in advance for typos that could remain in the texts)
A_note_on_Monte-Carlo_Integration.mw

The main worksheet refers to this one
How_does_the_variance_of_f_impact_the_estimator_dispersion.mw

Extra worksheet: An introduction to Importance Sampling
Importance_Sampling.mw


Under the name of mmcdara (unfortunately inaccessible since the major July 2025 Mapleprimes outage, and probably lost forever, God rest his soul.) I published two years ago a post about Multivariate Normal Distribution.

The current post continues in the same vein and presents the construction of a few new Multivariate Random Variables (MRV for short) named Multinomial (see for instance this recent question), Dirichlet, Categorical and related compound distributions.
I advice the interested readers to give a quick look to these names on Wikipedia (more specific references are given at the top of the wotksheet).

As I explained (in fact as my alter ego did) in Multivariate Normal Distribution, the Statistics package is limited to univariate random variables  and thus implementing MRVs requires a little cunning.
Here is a list of a few problems you face:

  • Whereas the expectation (sometimes named "mean") of a univariate random variable is a number or an expression, the expectation of a MRV is a vector (or a list, a n-uple, ...) of numbers or expressions.

So far, so good, except that the Mean attribute of Distribution can only be a scalar quantity. So if you want to assign a vector to Mean you have to code it some way and do something like Decode(Mean(My_MRV)) to get the expectation in a vector form.
 

  • The Variance case is even more tricky because MRVs variance are matrices.
     
  • Beyond this some very useful attributes like ParentName and Parameters cannot be instanciated in the definition of user random variables (whether there are MRVs or not), implying here again some bit of gymnastics in order that, if not really instantiate these attributes, be able at least to retrieve them when needed.
     
  • Finally, last but not least, the RandomSample is not appropriated to sample MRVs for reasons which are explained in the attached worksheet.


The file below contains more than 20 procedures enabling the definition of the studied MRVs, the decoding of the coded attributes, the visualization (which is not that immediate because the supports of the MRVs I foccus on are simplexes), the parameter estimations against empirical observations (frequentist and bayesian points of view), and so on.

Multinomial_Dirichlet_and_so_on.mw

Nevertheless, there is still a lot missing, but at some point I believe we need to decide that the work is over.

 

1 2 Page 1 of 2