[SciPy-Dev] Contingency Table Model

Anthony Scopatz scopatz@gmail....
Mon Aug 9 16:46:50 CDT 2010

On Mon, Aug 9, 2010 at 3:11 PM, <josef.pktd@gmail.com> wrote:

> On Mon, Aug 9, 2010 at 3:31 PM, Anthony Scopatz <scopatz@gmail.com> wrote:
> > Hello All,
> > I have just opened a ticket
> > (http://projects.scipy.org/scipy/ticket/1258) that adds a general
> > contingency table class to the the stats package.  This class includes
> > methods to slice and collapse the table as well a calculate metrics such
> as
> > chi-squared and entropy.
> > This implementation came out of Warren Weckesser and me working on this
> over
> > the SciPy 2010 statistics sprint.
> > Please take a look!  Comments and suggestions are always welcome.
> just a quick question that I don't understand from a brief look at the
> source
> Isn't the core of "from_columns" doing the same quantization as
> np.histogramdd? ( I haven't looked closely enough yet)
> If x in from_columns is a tuple, then an array_like could also contain
> strings, e.g. names/levels of a categorical variable. I'm not sure how
> far this should go.
To kill two birds with one stone, from_columns() and np.histogramdd() do
effectively the same thing for continuous variables but specifying bounds
and distributions rather than bins.  However, from_columns() allows for
discrete variables, which as you pointed out can handle categorical,
string-based data.  See the attached file for an example.  (Maybe this
method of making histograms should be in numpy?)  The reason I with the
bounds/dist rather than bin implementation is that bounds/dists are more
often what you play around with when exploring the data.

other ideas
> methods or functions "from_flat" and "to_flat" would be useful.
> chi2 could be renamed to chi2_indep, or take an optional expected
> keyword, where the user could specify other distribution hypotheses.
An expected keyword would work well here.  It might be a better idea to
include such a keyword in __init__() and from_columns().  I'd just need to
make sure that the collapse and slice methods propagate this properly.

I can also see how  "from_flat" and "to_flat" methods would be nice.

Be Well

> Josef
> > Be Well,
> > Anthony
> > _______________________________________________
> > SciPy-Dev mailing list
> > SciPy-Dev@scipy.org
> > http://mail.scipy.org/mailman/listinfo/scipy-dev
> >
> >
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/scipy-dev/attachments/20100809/558a35bb/attachment.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ct_cat.py
Type: text/x-python
Size: 450 bytes
Desc: not available
Url : http://mail.scipy.org/pipermail/scipy-dev/attachments/20100809/558a35bb/attachment.py 

More information about the SciPy-Dev mailing list