[SciPy-Dev] Contingency Table Model

josef.pktd@gmai... josef.pktd@gmai...
Mon Aug 9 15:47:12 CDT 2010


On Mon, Aug 9, 2010 at 4:35 PM, Bruce Southey <bsouthey@gmail.com> wrote:
>
> On 08/09/2010 02:31 PM, Anthony Scopatz wrote:
>
> Hello All,
> I have just opened a ticket
> (http://projects.scipy.org/scipy/ticket/1258) that adds a general
> contingency table class to the the stats package.  This class includes
> methods to slice and collapse the table as well a calculate metrics such as
> chi-squared and entropy.
> This implementation came out of Warren Weckesser and me working on this over
> the SciPy 2010 statistics sprint.
> Please take a look!  Comments and suggestions are always welcome.
> Be Well,
> Anthony
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>
> Some points:
>
> 1) You can not use numpy's asarray function without checking the input type.
> You must be aware of at least masked arrays and Matrix inputs as well as new
> data types.
>
> 2) You can not force a dtype on the user -  on line 54 when you can provide
> optional precision.
>
> 3) Can you please clarify lines 112-113?
> "  scipy.stats.chisquare -- one-way chi-square test (which is not the same
> as the n-way test with n=1)."
> This needs to be a little more clear because the exact same test statistic
> is being used. In fact the function must give the correct answer with 1d
> array.
>
> 4) Related to point 3, lines 72-74 are not correct, see
> http://en.wikipedia.org/wiki/Pearson's_chi-square_test
>
> 5) You must allow the user to provide their own expected values
>
> 6) Users need to be able to control the output - really I don't want to see
> the table of expected values unless requested. Also a user might just want
> the table of expected values and nothing else.
>
> 7) You should not need the chi2 function.
>
> 8) More generally, what is the need for having an ContingencyTable object?

maybe some usage examples will be nice.

I like the collapse methods, since, I think, it makes it easy to test
(for marginal ?) independence along different variables. Similar for
slicing to test conditional independence, but I haven't read through
the slicing method yet.

In the long term it might also be useful to attach other tests for
contingency tables for convenience, fisher- exact, kendall tau and
other tests that apply.
And when numpy gets the labeled array, we can attach labels for the categories.

Josef


Josef

>
>
> Bruce
>
>
>
>
>
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>
>


More information about the SciPy-Dev mailing list