[SciPy-Dev] Contingency Table Model

Anthony Scopatz scopatz@gmail....
Wed Aug 11 14:10:07 CDT 2010

On Mon, Aug 9, 2010 at 3:35 PM, Bruce Southey <bsouthey@gmail.com> wrote:

> On 08/09/2010 02:31 PM, Anthony Scopatz wrote:
> Hello All,
>  I have just opened a ticket (http://projects.scipy.org/scipy/ticket/1258) that
> adds a general contingency table class to the the stats package.  This class
> includes methods to slice and collapse the table as well a calculate metrics
> such as chi-squared and entropy.
>  This implementation came out of Warren Weckesser and me working on this
> over the SciPy 2010 statistics sprint.
>  Please take a look!  Comments and suggestions are always welcome.
> Be Well,
> Anthony
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev@scipy.orghttp://mail.scipy.org/mailman/listinfo/scipy-dev
>  Hello All,

I have updated the ticket with new versions of the contingency_table.py and
test_contingency_table.py.  I also have a github clone of scipy now, if you
just want to grab the changes, http://github.com/scopatz/scipy

Issues addressed in the new version:

   1. Expected tables may now be user-specified,
   2. added from_flat() and to_flat() methods,
   3. Retooled the chi_square() method and removed the chisquare_nway()
   4. All table metric methods (entropy) now add the calculated value to the
   contingency table's attributes as well as returning the value.

Bruce, Thank you for your concerns.  I'd like to address your points below.

> 1) You can not use numpy's asarray function without checking the input
> type. You must be aware of at least masked arrays and Matrix inputs as well
> as new data types.
> 2) You can not force a dtype on the user -  on line 54 when you can provide
> optional precision.

These are handled by now allowing the user to specify their own expected
table.  The expected_nway() function that these to points relate to can now
be avoided completely, if desired.

> 3) Can you please clarify lines 112-113?
> "  scipy.stats.chisquare -- one-way chi-square test (which is not the same
> as the n-way test with n=1)."
> This needs to be a little more clear because the exact same test statistic
> is being used. In fact the function must give the correct answer with 1d
> array.
> 4) Related to point 3, lines 72-74 are not correct, see
> http://en.wikipedia.org/wiki/Pearson's_chi-square_test<http://en.wikipedia.org/wiki/Pearson%27s_chi-square_test>

The chisquared_nway() function has been removed, so 3) and 4) no longer

> 5) You must allow the user to provide their own expected values


>  6) Users need to be able to control the output - really I don't want to
> see the table of expected values unless requested. Also a user might just
> want the table of expected values and nothing else.

The expected table, much like the probability table or the number of degrees
of freedom or the number of dimensions, is not really an output.  Rather it
is more of an attribute that helps calculate outputs, like the entropy,
mutual information, etc.  Therefore it should always be included in an
instance of ContingencyTable.  A user could simply have an array of values
that they call a contingency table, but this class provides a tool for
easily calculating related metrics (outputs).

7) You should not need the chi2 function.

Now required since chisquared_nway() was removed.

>  8) More generally, what is the need for having an ContingencyTable object?

Basically, my argument for the need is that contingency tables (or cross
tabulations) are expected as standard in any statistics package.  R has
them, Matlab has them, SPSS has them, Stata has them, and so on.  I know
that when I came to scipy.stats and found that they weren't here already, I
was disappointed.

I hope this helps!

Be Well

> Bruce
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/scipy-dev/attachments/20100811/fd1b0481/attachment.html 

More information about the SciPy-Dev mailing list