Carl Love

Carl Love

28070 Reputation

25 Badges

13 years, 31 days
Himself
Wayland, Massachusetts, United States
My name was formerly Carl Devore.

MaplePrimes Activity


These are replies submitted by Carl Love

I've extensively updated this code with comments and to use syntax features new to Maple 2019. In particular, I like the new local syntax, which allows placement of the declaration close to where the variable is used, which I think is more robust and easier to read. I've also used the new increment syntax, ++var, and the new neutral-operator syntax, which allows the use of the operator in functional form (i.e., coming before it arguments) without the use of back quotes.

I recently encountered on MaplePrimes a problem for which Fisher's exact test is ideal, and I've included that problem.

Since MaplePrimes doesn't reproduce Code Edit Regions, here's my code, and the worksheet follows.

#This code uses features new to Maple 2019.

ContingencyTableTests:= proc(
	O::Matrix(nonnegint), #contingency table of observed counts 
	{method::identical(Pearson, Yates, G, Fisher, Fisher[count]):= 'Pearson'}
	#Use 'method' = 'Fisher[count]' to count how many table variations are 
	#needed for Fisher's.
)
description 
	"Returns p-value for Pearson's (w/ or w/o Yates's continuity correction)" 
	" or G chi-squared or Fisher's exact test."
	" All computations are done in exact arithmetic."
;
option
	author= "Carl Love <carl.j.love@gmail.com>, 29-May-2019",
	references= (                                                           #Ref #s:
		"https://en.wikipedia.org/wiki/Pearson%27s_chi-squared_test",         #*1
		"https://en.wikipedia.org/wiki/Yates%27s_correction_for_continuity",  #*2
		"https://en.wikipedia.org/wiki/G-test",                               #*3
		"https://en.wikipedia.org/wiki/Fisher%27s_exact_test",                #*4
		"Eric W Weisstein \"Fisher's Exact Test\" _MathWorld_"
		"--A Wolfram web resource:"
		"   http://mathworld.wolfram.com/FishersExactTest.html"               #*5
	)
;
uses AT= ArrayTools, St= Statistics, It= Iterator;
local
	C:= AT:-AddAlongDimension(O,1), R:= AT:-AddAlongDimension(O,2), #column & row sums
	c:= numelems(C), r:= numelems(R), #counts of columns & rows
	n:= add(R), #number of observations

	#All remaining code in `local` applies strictly to Fisher's.
	`&*!`:= mul@factorial~, 
	Cutoff:= &*!(O), #cut-off value for less-likely matrices
	CTs:= 0, #count of contingency tables with same row and column sums as O.
	Devs:= 0, #count of those tables with probabilities <= that of O.
	
	#Generate recursively all contingency tables whose row and column sums match O.
	#A very similar recursion is documented on the Maple help page
	# ?Iterator,BoundedComposition as example "Contingency Table".	
        CountCTs:= proc(C, i)
	local X; 
		`if`(i=r, 1, add(thisproc(C-X,i+1), X= It:-BoundedComposition(C,R[i])))
	end proc,
		
        #Using the same recursion as above, generate the tables and compute their  
	#probabilities under the null hypothesis. The formula for the probability is
	#from reference *5. Sum probabilities of all those at least as improbable as O. 
	#Parameters: 
	#   C = column sums remaining to be filled; 
	#   F = product of factorials of entries of contingency table being built;
	#   i = row to be chosen this iteration
	AllCTs:= (C, F, i)->
		if i = r then #Recursion ends.
			if irem(++CTs, 10^5) = 0 then Report() fi;
			#Last row is C, the unused portion of column sums:
			if (local P:= F*&*!(C)) >= Cutoff then ++Devs; 1/P else 0 fi
		else
			local X;
			add(thisproc(C-X, F*&*!(X), i+1), X= It:-BoundedComposition(C,R[i]))
		fi,
   
	Report:= ()-> userinfo(
		1, ContingencyTableTests, NoName, CTs, "tables evaluated, of which", Devs, 
		"are at least as improbable as the observed input."
	)     
;
	if method in {'Fisher', 'Fisher'['count']} then
		if method = 'Fisher'['count'] then return CountCTs(C,1) fi; 
		local p:= AllCTs(C, 1, 1)*&*!(R)*&*!(C)/n!;  #p-value (see *5)
		Report();
		p
	#Readers only interested in Fisher's exact test can ignore everything below
	#this point.
        else #Use one of the chi-squared methods: Pearson, G, Yates.
		local E::rtable; #matrix of expected values under null hypothesis 		
		try
			E:= rtable(
				(1..r, 1..c), 
				(i,j)-> R[i]*C[j]/n,
				#The chi-squared tests aren't sufficiently
				#accurate if any cell has expected value < 5:
				datatype= satisfies(x-> x >= 5) 
			)
		catch "unable to store":
			error
				"cell expected value too small for chosen method;"
				" use 'method'= 'Fisher'"
		end try;
		userinfo(
			1, ContingencyTableTests, 
			"Table of expected values under null hypothesis:", 
			print(
				`if`(
					E::'rtable'(posint), 
					E, 
					E=subsindets(E, Not(integer), evalf[1+ilog10(n)])
				)
      		)
		);
		#Pearson's, Yates's, and G all use a chi-square statistic,
		#each computed by a slightly different formula:
		local Chi2:= add@~table([
			'Pearson'= ((O,E)-> (O-E)^~2 /~ E),                     #see *1
			'Yates'= ((O,E)-> (abs~(O-E) -~ 1/2)^~2 /~ E),          #see *2
			'G'= ((O,E)-> 2*O*~map(x-> `if`(x=0, 0, ln(x)), O/~E))  #see *3
		]);
		1 - St:-CDF(ChiSquare((r-1)*(c-1)), Chi2[method](O,E)) #p-value 
	fi   
end proc:


 

Contingency table tests, including Fisher's exact test

Author: Carl Love <carl.j.love@gmail.com>, 29-May-2019

restart:

This code uses features new to Maple 2019.

Problem: On an episode of the television show Survivor, there were 15 contestants divided into 2 teams of sizes 9 and 6. The producers decided to divide them (using an unknown selection process) into 3 teams of size 5. The resulting selection was that one of the new teams was drawn entirely from the size-9 team, a second was drawn entirely from the size-6 team, and, of course, the third was comprised of the remaining 5 contestants. Assuming that the selection process was random (this being the null hypothesis), what is the probability of achieving a result at least this extreme?

 

Solution: Although this probability can be easily computed by hand using basic combinatorics (see the end of this worksheet), it's also a good problem for Fisher's exact test. In the matrix below, the rows show the division into the original teams, and the columns show the division into the new teams.

SurvivorTeams:= <
   5, 0, 1;
   0, 5, 4
>:

infolevel[ContingencyTableTests]:= 1:

ContingencyTableTests(SurvivorTeams, method= Fisher): % = evalf(%);

25 tables evaluated, of which 6 are at least as improbable as the observed input.

6/1001 = 0.5994005994e-2

Since the number of tables which need to be evaluated can reach into the millions even for modestly sized problems, I've included an option to count them prior to doing the full computation:

ContingencyTableTests(SurvivorTeams, method= Fisher[count]);

25

This problem is too small to use any of the more-standard contingency table tests based on a chi-squared statistic, and my program can detect that:

ContingencyTableTests(SurvivorTeams);

Error, (in ContingencyTableTests) cell expected value too small for chosen method; use 'method'= 'Fisher'

There are numerous intuitive ways to do the computation by hand. Here's one:

binomial(9,5)*binomial(6,5)/(binomial(15,5)*binomial(10,5)/3!);

6/1001

 


 

Download FishersExact.mw

@q41b3 Well, you can still choose this Answer as best.

@q41b3 The difference between else if and elif is purely syntactic. Each if must have a corresponding terminator (fiend if, or end---those three being equivalent in this context). Using else if starts a new if statement, so it requires two terminators. 

@melika mokhtari Your z contains the derivative to the 6th power.  Then c contains a term with z in the denominator. Then ode contains separate terms with c and the derivative squared.

I don't know how to fix it.

@tomleslie Yes, I agree that if the OP is working with mathematical structures that are represented symbolically in Maple (rather than working with strings), then it is far better to use the built-in type system than regular expressions. 

If the OP gives more details of their project, I'd be happy to show some examples of using types.

@torabi You need to change F[n] to F[.., n]. The former is a row vector, and the latter is a column vector.

@tomleslie As I said last night, our relative memory and cputime numbers were not surprising.

I just wrote a version that uses a full-size Array, as yours does. My memory usage is still much lower than yours, which I find surprising, and also my cputime is much lower. 

CodeTools:-Usage(sieve(2^24)): #Tom's procedure, not copied here.
memory used=1.37GiB, alloc change=-329.75MiB, cpu time=9.72s, real time=6.10s, gc time=4.28s

restart:

EratosthenesSieve:= proc(N::posint)
description "Returns list of primes <= N";
   if N < 5 then return remove(`>`, [2, 3], N) fi;
   local p, P:= rtable(2..N, k-> k);
   P[[seq(4..N, 2), seq(seq(p^2..N, 2*p), p= thisproc(isqrt(N))[2..])]]:= ();
   [seq(P)]
end proc 
:
CodeTools:-Usage(EratosthenesSieve(2^24)):
memory used=0.90GiB, alloc change=102.38MiB, cpu time=4.94s, real time=3.16s, gc time=2.05s

I post this not out of any desire to criticize, but simply in the spirit of education and good-natured sportsmanship.

@torabi So, do you not understand the meaning of the error message? Are you incapable of tracking it down yourself? You've been posting on MaplePrimes for nearly 4 years. I can understand your need for some help with the details of a Matlab-to-Maple translation, but I can't hold your hand for every step.

Second degree? Ha! Just eyeballing it, I see that it's at least degree 8 in the derivative.

@torabi You need to replace exp with LinearAlgebra:-MatrixExponential.

@brian bovril Reading through that thread to find my code that you refer to is brutal, so I put it here instead:

EratosthenesSieve:= proc(N::posint)
# Returns list of primes <= N 
   local
      # Only record data for the odds >= 3
      M:= iquo(N-1,2)
     ,prime:= rtable(1..M, fill= true)
     ,p, P, k
   ;
   for p to trunc(evalf(sqrt(N)/2)) do
      if prime[p] then
         P:= 2*p+1;
         for k from (P+1)*p by P to M do  prime[k]:= false  end do
      end if
   end do;
   [`if`(N=1,[][],2), seq(`if`(prime[p], 2*p+1, [][]), p= 1..M)]
end proc:
     
CodeTools:-Usage(EratosthenesSieve(2^14)):
memory used=0.55MiB, alloc change=0 bytes, cpu time=16.00ms, real time=8.00ms, gc time=0ns

 

You say, essentially, that you want to plot P(t) vs. E. That's 3 dimensions: E = 0...4 (on horizontal axis), t = 0..T (on which axis?), and P = P(0)..P(T) (on vertical axis, I guess). Do you understand why this can't work?

If you want to plot P(T) vs. E, that can be done, because is not a dimension; it's just the number 20.

If you want a 3D plot of P(t) vs. (E, t), that can be done.

If you want a sequence of 2D plots of P(t) vs. t for various specific values of E, that can be done, but E won't be an axis.

If you want a sequence of 2D plots of P(t) vs. t with E varying continuously through 0..4 with its value shown by color gradation, that can be done. In this case, color is the 3rd dimension.

If you want the same thing as an animation with E varying with time (which is thus the third dimension), that can be done.

Your example of what you expect shows multiple distinct plots. Why is that?

@torabi That's just a small excerpt from a math paper with the formulas that you want to implement. I can't debug code in a language that I don't know well (Matlab) and then translate it to Maple. Please use explicit, nontrivial, matrices instead of scalars for everything that should be a matrix. When you've done that, I'll take another look at it.

@torabi Sorry then, I don't know Matlab well enough to figure it out. From my reading of Matlab's online help, size(M,1) refers to the extent of the 1st dimension of matrix M. I didn't see any meaning for when is a scalar. Hopefully someone else can answer. I do have a vague memory that in Matlab scalars are actually 1x1 matrices. Could that be the key to the solution?

In my translation table above, I added a translation for T= S:h:F because it looks like you need that for this code.

@Rouben Rostamian  wrote:

  • [T]he system A.U=B is solved m times within a for-loop with the same A, while B varies. In my mind I justified doing the inversion once and then applying the inverse m times as being not worse than solving the system m times, but I may be wrong on that.

Using theorethical considerations alone, i.e., order asymptotics, I'll explain why the O(n) solver is better even in this case where you repeatedly use the same coefficient matrix A for different side vectors B. Let's say that A1:= A^(-1). Then simply doing the multiplication U:= A1.B even once requires O(n^2) steps. This is not even counting the time to compute the inverse.

First 272 273 274 275 276 277 278 Last Page 274 of 709