Why this post
This work was intended to be a simple reply to a question asked a few days ago.
At some point, I realised that the approach I was using could have a more general interest which, in my opinion, was worth a post.
In a few words, this post is about solving an algebra problem using a method originally designed to tackle statistical problems.
The Context
Recently @raj2018 submitted a question I'm going to resume this way:
Let S(phi ; beta, f) a function of phi parameterized by beta and f.
Here is the graph of S(phi ; 0.449, 0.19) @raj2018 provided
@raj2018 then asked how we can find other values (A, B) of values for (beta, f) such that the graph of S(phi, A, B) has the same aspect of the graph above.
More precisely, let phi_0 the largest strictly negative value of phi such that S(phi_0, A, B) = 0.
Then S(phi, A, B) must be negative (strictly negative?) in the open interval (phi_0, 0), and must have exactly 3 extrema in this range.
I will said the point (A, B) is admissible is S(phi, A, B) verifies thess conditions
The expression of S(phi, A, B) is that complex that it is likely impossible to find an (several?, all?) admissible point using analytic developments.
The approach
When I began thinking to this problem I early thought to find the entire domain of admissible points: was it something possible, at least with some reasonable accuracy?
Quite rapidly I draw an analogy with an other type of problems whose solution is part of my job: the approximate construction of the probability density function (PDF) of multivariate random variables (obviously this implies that no analytical expression of this PDF is available). This is a very classical problem in Bayesian Statistics, for instance when we have to construt an approximation of a posterior PDF.
To stick with this example and put aside the rare situations where this PDF can be derived analytically, building a posterior PDF is largely based on specific numerical methods.
The iconic one is known under the generic name MCMC which stands for Markov Chain Monte Carlo.
Why am I speaking about MCMC or PDF or even random variables?
Let us consider some multivariate random variable R whose PDF as a constant on some bounded domain D and is equal to 0 elsewhere. R is then a uniform random variable with support supp(R) = D.
Assuming the domain Adm of admissible (beta, f) is bounded, we may think of it as the support of some uniform random variable. Following this analogy we may expect to use some MCMC method to "build the PDF of the bivariate random variable (beta, f)", otherwise stated "to capture​​​​​​ the boundary of​ Adm".
The simplest MCMC method is the Metropolis-Hastings algorithm (MH).
In a few simple words MH builds a Markov chain this way:
- Let us assume that the chain already contains elements e1, ..., en.
Let f some suitable "fitness" function (whose nature is of no importance right now).
- A potential new element c is randomly picked in some neighborhood or en.
- If the ratio f (c) / f (en) is larger than 1, we decide to put c into the chain (thus en+1 = c) otherwise we leave it to chance to decide whether or not c iis put into the chain.
If chance decides the contrary, then en is duclicated (thus en+1 = en).
MH is not the most efficient MCMC algorithm but it is efficient enough for what we want to achieve.
The main difficulty here is that there is no natural way to build the fitness function f , mainly because the equivalent random variable I talked about is a purely abstract construction.
A preliminary observation is that if S(phi, beta, f) < 0 whatever phi in (phi_0, 0), then S has an odd number of extrema in (phi_0, 0). The simplest way to find these extrema is to search for the zeros of the derivative S' of S with respect to phi, while discardinq those where the second derivative can reveal "false" extrema where both S'' of S is null (I emphasize this last point because I didn't account for it in attached file).
The algorithm designed in this file probably misses a few points for not checking if S''=0, but it is important to keep in mind that we don't want a complete identification of Adm but just the capture of its boundary.
Unless we are extremely unlucky there is only a very small chance that omitting to check if S''=0 will deeply modify this boundary.
How to define function f ?
What we want is that f (c) / f (en) represents the probability to decide wether c is an admissible point or not. In a Markov chain this ratio represents how better or worse c is relatively to en, and this is essential for the chain to be a true Markov chain.
But as our aim is not to build a true Markov chain but simply a chain which looks like a Markov chain, we we can take some liberties and replace f (c) / f (en) by some function g(c) which quantifies the propability for c to be an admissible couple. So we want that g(c) = 1 if S(phi, c) has exactly M=3 negative extrema and g(c) < 1 if M <> 3.
The previous algorihm transforms into:
- Let us assume that the chain already contains elements e1, ..., en.
Let g a function which the propability that element is admissible
- A potential new element c is randomly picked in some neighborhood or en.
- If the ratio g(c) is larger than 1, we decide to put c into the chain (thus en+1 = c) otherwise we leave it to chance to decide whether or not c iis put into the chain.
If chance decides the contrary, then en is duclicated (thus en+1 = en).
This algorithm can also be seen as a kind of genetic algorithm.
A possible choice is g(c)= exp(-|3-M|).
In the attached file I use instead the expression g(c) = (M + 1) / 4 fo several reasons:
- It is less sharp at M=3 and thus enables more often to put c into the chain, which increases its exploratory capabilities.
- The case M > 3, which no preliminary investigation was able to uncover, is by construction eliminated in the procedure Extrema which use an early stopping strategy (if as soon as more than M=3 negative extrema are found the procedure stops).
The algorithm I designed basically relies upon two stages:
- The first one is aimed to construct a "long" Markov-like chain ("long" and not long because Markov chains are usually much longer than those I use).
There are two goals here:
- Check if Adm is or not simply-connected or not (if it has holes or not).
- Find a first set of admissible points that can be used as starting points for subsequent chains.
- Run several independent Markov-like chains from a reduced set of admissible points.
The way this reduced set is constructed depends on the goal to achieve:
- One may think of adding points among those already known in order to assess the connectivity of Adm,
- or refinining the boundary of Adm.
These two concurent objectives are mixed in an ad hoc way depending on the observation of the results already in hand.
We point here an important feature of MCMC methods: behind their apparent algorithmic simplicity, it is common that high-quality results can only be obtained efficiently at the cost of problem-dependent tuning.
A last word to say that after several trials and failures I found it simpler to reparameterize the problems in terms of (phi_0, f) instead of (beta, f).
Codes and results
Choice g(c) = (M + 1) / 4
The code : Extrema_and_MCMC.mw
To access the full results I got load this m file (do not bother its extension, Mapleprimes doesn't enable uploading m files) MCMC_20231209_160249.mw (save it and change it's extension in to m instead mw)
EDITED: choice g(c)= exp(-|3-M|)
Here are the files contzining the code and the results:
Extrema_and_MCMC_g2.mw
MCMC_20231211_084053.mw
To ease the comparison of the two sets of results I used the same random seeds inn both codes.
Comparing the results got around the first admissible point is straightforward.
It's more complex for @raj2018's solution because the first step of the algorithim (drawing of a sibgle chain of length 1000) finds six times more admissible point with g(c)= exp(-|3-M|) than with g(c) = (M + 1) / 4.