The procedure presented here does independence tests of a contingency table by four methods:

  1. Pearson's chi-squared (equivalent to Statistics:-ChiSquareIndependenceTest),
  2. Yates's continuity correction to Pearson's,
  3. G-chi-squared,
  4. Fisher's exact.

(All of these have Wikipedia pages. Links are given in the code below.) All computations are done in exact arithmetic. The coup de grace is Fisher's. The first three tests are relatively easy computations and give approximations to the p-value (the probability that the categories are independent), but Fisher's exact test, as its name says, computes it exactly. This requires the generation of all matrices of nonnegative integers that have the same row and column sums as the input matrix, and for each of these matrices computing the product of the factorials of its entries. So, there are relatively few implementations of it, and perhaps none that do it exactly. (Could some with access check Mathematica please?)

Our own Joe Riel's amazing and fast Iterator package makes this computation considerably easier and faster than it otherwise would've been, and I also found inspiration in his example of recursively counting contingency tables found at ?Iterator,BoundedComposition. 

ContingencyTableTests:= proc(
   O::Matrix(nonnegint), #contingency table of observed counts 
   {method::identical(Pearson, Yates, G, Fisher):= 'Pearson'}
)
description 
   "Returns p-value for Pearson's (w/ or w/o Yates's continuity correction)" 
   " or G chi-squared or Fisher's exact test."
   " All computations are done in exact arithmetic."
;
option
   author= "Carl Love <carl.j.love@gmail.com>, 27-Oct-2018",
   references= (                                                           #Ref #s:
      "https://en.wikipedia.org/wiki/Pearson%27s_chi-squared_test",         #*1
      "https://en.wikipedia.org/wiki/Yates%27s_correction_for_continuity",  #*2
      "https://en.wikipedia.org/wiki/G-test",                               #*3
      "https://en.wikipedia.org/wiki/Fisher%27s_exact_test",                #*4
      "Eric W Weisstein \"Fisher's Exact Test\" _MathWorld_--A Wolfram web resource:"
      " http://mathworld.wolfram.com/FishersExactTest.html"                 #*5
   )
;
uses AT= ArrayTools, St= Statistics, It= Iterator;
local
   #column & row sums: 
   C:= AT:-AddAlongDimension(O,1), R:= AT:-AddAlongDimension(O,2),
   r:= numelems(R), c:= numelems(C), #counts of rows & columns
   n:= add(R), #number of observations
   #matrix of expected values under null hypothesis (independence):
   #(A 0 entry would mean a 0 row or column in the original, which is not allowed.)
   E:= Matrix((r,c), (i,j)-> R[i]*C[j], datatype= 'positive') / n,
   #Pearson's, Yates's, and G all use a chi-sq statistic, each computed by 
   #slightly different formulae.
   Chi2:= add@~table([
       'Pearson'= (O-> (O-E)^~2 /~ E),                     #see *1
       'Yates'= (O-> (abs~(O - E) -~ 1/2)^~2 /~ E),        #see *2
       'G'= (O-> 2*O*~map(x-> `if`(x=0, 0, ln(x)), O/~E))  #see *3
   ]), 
   row, #alternative rows generated for Fisher's
   Cutoff:= mul(O!~), #cut-off value for less likely matrices
   #Generate recursively all contingency tables whose row and column sums match O.
   #Compute their probabilities under independence. Sum probabilities of all those
   #at most as probable as O. (see *5, *4)
   #Parameters: 
   #   C = column sums remaining to be filled; 
   #   F = product of factorials of entries of contingency table being built;
   #   i = row to be chosen this iteration
   AllCTs:= (C, F, i)->
      if i = r then #Recursion ends; last row is C, the unused portion of column sums. 
         (P-> `if`(P >= Cutoff, 1/P, 0))(F*mul(C!~))
      else
         add(
            thisproc(C - row[], F*mul(row[]!~), i+1), 
            row= It:-BoundedComposition(C, R[i])
         )
      fi      
;
   userinfo(1, ContingencyTableTests, "Table of expected values:", print(E));
   if method = 'Fisher' then AllCTs(C, 1, 1)*mul(R!~)*mul(C!~)/n!
   else 1 - St:-CDF(ChiSquare((r-1)*(c-1)), Chi2[method](O)) 
   fi   
end proc:

The worksheet below contains the code above and one problem solved by the 4 methods


 

 

DrugTrial:= <
   20, 11, 19;
   4,  4,  17
>:

infolevel[ContingencyTableTests]:= 1:

ContingencyTableTests(DrugTrial, method= Pearson):  % = evalf(%);

ContingencyTableTests: Table of expected values:

Matrix(2, 3, {(1, 1) = 16, (1, 2) = 10, (1, 3) = 24, (2, 1) = 8, (2, 2) = 5, (2, 3) = 12})

exp(-257/80) = 0.4025584775e-1

#Compare with:
Statistics:-ChiSquareIndependenceTest(DrugTrial);

hypothesis = false, criticalvalue = HFloat(5.991464547107979), distribution = ChiSquare(2), pvalue = HFloat(0.04025584774823787), statistic = 6.425000000

infolevel[ContingencyTableTests]:= 0:
ContingencyTableTests(DrugTrial, method= Yates):  % = evalf(%);

exp(-1569/640) = 0.8615885805e-1

ContingencyTableTests(DrugTrial, method= G):  % = evalf(%);

exp(-20*ln(5/4)+4*ln(2)-11*ln(11/10)-4*ln(4/5)-19*ln(19/24)-17*ln(17/12)) = 0.3584139703e-1

CodeTools:-Usage(ContingencyTableTests(DrugTrial, method= Fisher)):  % = evalf(%);

memory used=0.82MiB, alloc change=0 bytes, cpu time=0ns, real time=5.00ms, gc time=0ns

747139720973921/15707451356376611 = 0.4756594205e-1

 


 

Download FishersExact.mw


Please Wait...