:

## The Kolmogorov-Smirnov test

Maple

The Kolmogorov-Smirnov test is a widespread, simple, and effective test to check the hypotheses of the form H[0]:=F[ksi](x)=F(x), where a function F[ksi](x) is the CDF of a population distribution, a function F(x) is a given continuous function (the Kolmogorov  test), and the hypotheses of the form  H[0]:=F[1](x)=F[2](x), where F[j](x), j=1,2, are the CDF of two population distributions, both are assumed to be continuous (the Smirnov test).  See the article ( http://en.wikipedia.org/wiki/Kolmogorov_test ) in Wiki  for more details.
By the way, Mathematica 8 includes this test.
We begin from the Kolmogorov test. We find D[n]:=sqrt(n)*max{|F[n](x)-F(x)|:-infinity < x < infinity}, where F[n](x) is the ?ProbabilityFunction of RandomVariable(?EmpiricalDistribution(S)) of a sample S having the size n. Next we compare D[n] with the root t[0.95]=1.3580 of the equation K(t)=0.95:
> fsolve(K(t) = .95, t = 0 .. 2);

1.358098639
(as usually, the significance level 0.05 is put),
where K :=t -> piecewise(0 < t, sum((-1)^j*exp(-2*j^2*t^2), j = -infinity .. infinity), 0) , see the output of
>plot(K, -1 .. 2, thickness = 3)

If D[n]<=t[0.95] the null hypothesis is accepted and if D[n]>t[0.95] the one should be rejected. It is quite simple to program with Maple.
For example, let us consider a sample of the size 100 from the ?NormalDistribution with the parameters a:=1 and sigma:=2 :
> with(Statistics):
> X := RandomVariable(Normal(1, 4));

_R
> S := Sample(X, 10^2);
Vector(4, {(1) = ` 1 .. 100 `*Vector[row], (2) = `Data Type: `*float[8],
(3) = `Storage: `*rectangular, (4) = `Order: `*Fortran_order})
> CDF(X, t);# its ?CDF

1/2+ 1/2* erf( sqrt(2)*(t - 1) /8)
> S[20];

-2.79600701124852424

Then we store S by rank:
> R := Rank(S):
> B := OrderByRank(S, R);
Vector(4, {(1) = ` 1 .. 100 `*Vector[row], (2) = `Data Type: `*float[8],
(3) = `Storage: `*rectangular, (4) = `Order: `*Fortran_order})
> B[20];
-2.85106767863987986
We put F(t) in such a way:
>F := unapply(1/2+(1/2)*erf((1/8*(t-.9))*sqrt(2.1)), t);
t-> 1/2+(1/2)*erf(.1811422094*t-.1630279885)

Now we calculate D[n] :
>Y := RandomVariable(EmpiricalDistribution(B)):
>C := map(t-> abs(CDF(Y, t)-F(t)),B);
Vector(4, {(1) = ` 1 .. 100 `*Vector[row], (2) = `Data Type: `*anything,
(3) = `Storage: `*rectangular, (4) = `Order: `*Fortran_order})
> max(C);

0.0899305682
See the picture, created by
>plot([t-> CDF(Y, t),t-> F(t)], color = [red, blue], thickness = 2);

> evalf(sqrt(10^2)*max(C));

0.8993056820
We draw the conclusion that the null hypothesis should be accepted.
Let us turn to the Smirnov test. In this case D[n1,n2]:=sqrt(n1*n2/(n1+n2))*max{|G[1](x)-G[2](x)|:-infinity < x < infinity}, where G[j](x), j=1, 2, are the empirical distributions of a sample S[1] with the size n1 and a sample S[2] with the size n2 correspondingly.
For example,
> Z1 := RandomVariable(Uniform(0, .95));

_R1

> n1 := 10^3: S1 := Sample(Z1, n1):
>R1 := Rank(S1):
>T1 := OrderByRank(S1, R1):
> Z2 := RandomVariable(Uniform(0, 1));

_R2

> n2 := 2*10^3: S2 := Sample(Z2, n2):
>R2 := Rank(S2):
>T2 := OrderByRank(S2, R2):
> W1 := RandomVariable(EmpiricalDistribution(T1)):
> W2 := RandomVariable(EmpiricalDistribution(T2)):
> C1 := map(t)-> abs(CDF(W1, t)-CDF(W2, t)), T1):
> max(C1);
0.0570000000
> C2 := map(t-> abs(CDF(W1, t)-CDF(W2, t)), T2):
> max(C2);

0.0565000000
> max(max(C1), max(C2));

0.0570000000
> evalf(sqrt(n1*n2/(n1+n2))*%);

1.471733671
Because 1.471733671 > 1.358098639, we draw the conclusion that the null hypothesis should be rejected.

PS. Sorry for the edit in the last lines. It's only those who do nothing that make no mistakes.

﻿