[Scipy-tickets] [SciPy] #1258: Contingency Table Class

SciPy Trac scipy-tickets@scipy....
Mon Aug 9 14:04:05 CDT 2010


#1258: Contingency Table Class
-------------------------------+--------------------------------------------
 Reporter:  scopatz            |       Owner:  somebody
     Type:  enhancement        |      Status:  new     
 Priority:  normal             |   Milestone:  0.9.0   
Component:  scipy.stats        |     Version:  0.7.0   
 Keywords:  contingency table  |  
-------------------------------+--------------------------------------------
 The attached files hold a new N-dimensional
 [http://en.wikipedia.org/wiki/Contingency_table contingency table] class
 and tests for this class.  The standard two-way table is present in almost
 every elementary statistics book.  However, an adequate model has yet to
 be included in scipy.

 The class here allows the user to load a table from a numpy array.
 Additionally, a contingency table can also be generated by sorting raw
 data from columns (1D arrays).

 {{{
 from scipy import stats
 import numpy as np

 # Initialize from array
 obs = np.array([[10,0], [0, 10]])
 ct = stats.ContingencyTable(observed=obs)

 # Initialize from columns of data
 a = np.array([1, 3,  10])
 b = np.array([3, 60, 45])
 c = np.array([0.9, 0.99, 0.999])
 x = (a,b,c)

 def nines(l, u, b):
     # Define our own distribution to apply to the data
     return 1.0 - np.logspace(np.log10(1.0 - l), np.log10(1.0 - u), b)

 bounds = ([1, 10], [0, 2], [0.9, 0.999])
 dists = (np.linspace, np.logspace, nines)

 ct = stats.ContingencyTable.from_columns(x, shape=(2, 4, 3),
     bounds=bounds, distribution=dists, discrete=(False, False, True))
 }}}

 Slices and summations of the contingency table are handled by class
 methods.  Additionally, important metrics, such as chi-squared value,
 entropy, and mutual information, are also defined as methods.

 Note that one of the strengths of this implementation is that metrics of
 lower dimensionality can always be calculated using a combination of the
 above methods.  For example, using the 'from column' table above

 {{{
 # Entropy of original table, H(a,b,c)
 h = ct.entropy

 # Entropy of just the a and c axes, H(a, c)
 ct1 = ct.collapse_shape((False, True, False))
 h1 = ct1.entropy()
 }}}

 Some related tickets are [http://projects.scipy.org/scipy/ticket/893 893],
 [http://projects.scipy.org/scipy/ticket/956 956], and
 [http://projects.scipy.org/scipy/ticket/1203 1203].

 Lastly, Warren Weckesser deserves a big thanks for working on this with me
 during the !SciPy 2010 statistics sprint.

-- 
Ticket URL: <http://projects.scipy.org/scipy/ticket/1258>
SciPy <http://www.scipy.org>
SciPy is open-source software for mathematics, science, and engineering.


More information about the Scipy-tickets mailing list