@Scot Gould

**Starting point:**

Computing the integral of a function **f(x) **over an interval **x=a..b**, can always be interpreted as computing the expectation the random variable **Y = f(X) **where **X** is a Uniform random variable with support **a..b**.

Indeed, denoting **phi(x) **the probability density function of **X** (**phi(x) = 1/(b-a)** if** a <= x <= b** and **0** elswhere)

**J = int(f(x), x=a..b) = (b-a) * int(f(x)*phi(x), x=a..b)**

The integral represents the **Expectation** (aka mean) **E(Y)** of the random variable **Y**. ** **

Then **J = (b-a)*E(Y)**.

The Monte-Carlo* method; originally designed to assess the Expectation of a random variable, generalizes to estimate the integral of a function over a **bounded** domain (generally connected).

*This name has been adapted into Monegasque: Monte-Carlu [ˌmõteˈkaʀlu]. The typographical rules for toponyms used by the Imprimerie Nationale [a French instution which translates in "national printing house"] require **Monte-Carlo **to be written **with a hyphen**. It's generally pronounced /Monté-Carlo/, but some say /Monté-Carl'/.

**Here is "your" problem**:

How to assess the **mean** value of** func(X) **where **X** is a Uniform random variable with support **0.8..3**.

One simple idea is to use a Monte-Carlo approach:

- draw a sample
**S **from **X**, let **n** the size of this sample,
- then
**add(func~(S))/n** is the Monte-Carlo estimator of **Mean(func(X))**.

**What may go wrong in doing this?**

Let us suppose that** func(x)** is a function whose value is close to zero within a large portion of the interval 0.8..3, and which gets very large values in a narrow subdomain of 0.8..3. Let us say the "effective" domain of variation of **func** has a size equal to a **w** (for instance 0.001)

Drawing the sample S randomly will give you about **N*w/(3-0.8)** points in this "effective" domain of variation, meaning that we have a small aount of chance to capture the true bahaviour of **func** in the region where ir strongly changes.As a consequence your Monte-Carlo estimator **add(func~(S))/n** can be of poor quality.

Being of poor quality means that doing again the same computation with another sample **S' ** (independent from the first one) can lead to a result significantly different from the first one, unless the value of **n** is huge.

So the estimator, which is a rantom variable whose **add(func~(S))/n** is a a particular realization and **add(func~(S'))/n** is another one, maypresent large deviations: onesays its variance is high.

The goal being to get reliable values for the estimator of **Mean(func(X))** the naive way is to increase the value of n (the variance of the Monte-Carlo estimator decreases as **n**^{-1}). The drawback being an increase of the computational time.

All the other ways consist in applying a more astute strategy where the sampling is done to capture the bahaviour of** func**.

The simplest strategy is named Importance Sampling.

Is goal is to reduce the variance of the Monte-Carlo estimator by introducing an auxiliary random variable **Z** whose PDF (Probability Density Function) is closer to **func** than the PDF of **X** is.

Then drawing a sample from Z will generate points whose spreading will give a closer picture of the

The basic idea and an illustrative example are provided in this file Importance_Sampling_Idea.mw.

Look how Importance Sampling dramatically decrease the variance of the Monte-Carlo estimator.

**Go back to the MMA code.**

Here is a new file which ALMOST does in **Maple** what is done with **MMA**: TruncatedNormal_3.mw.

That is assessing the mean value of **func** over the interval 0.8..3 by using both the crude Monte-Carlo method *(which implicitely considers that the random variable X is uniform with some support, which I arbitrarily took as 0.8..3 but could be something else in which case the estimator is known up to an arbitrary multiplicative constant*) and assesing the same estimator by using an auxiliary random variable **Z** whose PDF is "close" to **func** (**Z** *is the truncated gaussian random variable*).

You will see that the formula used to assess the Importance Sampling Estimator (**ISE**) does not exactly correspond to the formula the **MMA** code uses:

- The term
func[RV]/Distrib[RV, 1, 0.399]
# which translate in
func~(RV) /~ Distrib~(RV, 1, 0.399)

should be
func[RV]*f[RV]/Distrib[RV, 1, 0.399]
# which translate in
func~(RV) *~ f~(RV) /~ Distrib~(RV, 1, 0.399)
# where f is the PDF of the "native" random variable
...
# unless f = 1 for any real (which seems to be a nonsense as its integraj
# from -oo to +oo is not equal to 1!!!) but is clasiccaly used under the
# name of "improper distribution" in bayesian statistics

- The presence of the term
Integrate[Distrib[x, 1, 0.399], {x, 0.8, 3}]

Wich seems to mean that **Distrib** is not a true distribution (this integral should be naturally equal to 1, which is the case with the construction ot the **TruncatedGaussian** I made).

**Recently added**

The **MMA** code doesn't assess a mean value of **func(X)** but directly jumps to use the Monte-Carlo method to assess the integral of** func(x)** by using a majoring function (the PDF of the truncated gaussian random variable).

This is why the red line above is replaced by the pink one (see alose the double green equality at the top of this reply).

The code is byfar unnecessarily complex. For instance there is no need to use a truncated gaussian distribution to assess this integral, more of this the "1.1*x-0.1" stuff is useless and brings nothing but confusion.

Whatever... this new file explains what Monte-Carlo integration is, from its historical point of view to its classical to-day use.

Monte_Carlo.mw

**About Monte-Carlo integration**

"[You]* wondered if the sampling technique provided was going to be more useful than the crude method as a demonstration of a single-variable example of Monte Carlo integration. Of course, Maple and Mathematica can quickly integrate the function. *"

The L2 norm of the integration error of the Monte-Carlo method is **n**^{-1/2} whatever the dimension of the integration domain. This is signiicantly lower than a simple rectangle-rule integration which is **n**^{-1} ... in dimension 1 and **n**^{-1/d} in dimension **d**.

A Simson method has an integration error whixh evolves as **n**^{-2/d} in dimension **d **and is then outperformed by a Montecarlo integration in domains of dimensions higher than 4.

In fact the main interest of Monte-Carlo integration lies in the simplicity of the method and the fact it is efficient in domains of any shapes.

**Last point:**

"*Could you recommend a book, website, etc., that would help me get up to speed with probability distributions?*"

**I'll get back to you shortly**.

I heard positive feedback from Khan Academy (that I just browse very quickly to form a first idea).

If by chance you speak French, WikiStat is really quite good (not sure there is an english version).

"Good" off-the-shelf books are not that easy to find (they surely do exist).