[SciPy-Dev] Contingency Table Model
Anthony Scopatz
scopatz@gmail....
Mon Aug 9 16:46:50 CDT 2010
On Mon, Aug 9, 2010 at 3:11 PM, <josef.pktd@gmail.com> wrote:
> On Mon, Aug 9, 2010 at 3:31 PM, Anthony Scopatz <scopatz@gmail.com> wrote:
> > Hello All,
> > I have just opened a ticket
> > (http://projects.scipy.org/scipy/ticket/1258) that adds a general
> > contingency table class to the the stats package. This class includes
> > methods to slice and collapse the table as well a calculate metrics such
> as
> > chi-squared and entropy.
> > This implementation came out of Warren Weckesser and me working on this
> over
> > the SciPy 2010 statistics sprint.
> > Please take a look! Comments and suggestions are always welcome.
>
> just a quick question that I don't understand from a brief look at the
> source
>
> Isn't the core of "from_columns" doing the same quantization as
> np.histogramdd? ( I haven't looked closely enough yet)
>
> If x in from_columns is a tuple, then an array_like could also contain
> strings, e.g. names/levels of a categorical variable. I'm not sure how
> far this should go.
>
>
To kill two birds with one stone, from_columns() and np.histogramdd() do
effectively the same thing for continuous variables but specifying bounds
and distributions rather than bins. However, from_columns() allows for
discrete variables, which as you pointed out can handle categorical,
string-based data. See the attached file for an example. (Maybe this
method of making histograms should be in numpy?) The reason I with the
bounds/dist rather than bin implementation is that bounds/dists are more
often what you play around with when exploring the data.
other ideas
> methods or functions "from_flat" and "to_flat" would be useful.
> chi2 could be renamed to chi2_indep, or take an optional expected
> keyword, where the user could specify other distribution hypotheses.
>
>
An expected keyword would work well here. It might be a better idea to
include such a keyword in __init__() and from_columns(). I'd just need to
make sure that the collapse and slice methods propagate this properly.
I can also see how "from_flat" and "to_flat" methods would be nice.
Be Well
Anthony
> Josef
>
>
> > Be Well,
> > Anthony
> > _______________________________________________
> > SciPy-Dev mailing list
> > SciPy-Dev@scipy.org
> > http://mail.scipy.org/mailman/listinfo/scipy-dev
> >
> >
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/scipy-dev/attachments/20100809/558a35bb/attachment.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ct_cat.py
Type: text/x-python
Size: 450 bytes
Desc: not available
Url : http://mail.scipy.org/pipermail/scipy-dev/attachments/20100809/558a35bb/attachment.py
More information about the SciPy-Dev
mailing list