[Scipy-tickets] [SciPy] #1258: Contingency Table Class
SciPy Trac
scipy-tickets@scipy....
Mon Aug 9 14:04:05 CDT 2010
#1258: Contingency Table Class
-------------------------------+--------------------------------------------
Reporter: scopatz | Owner: somebody
Type: enhancement | Status: new
Priority: normal | Milestone: 0.9.0
Component: scipy.stats | Version: 0.7.0
Keywords: contingency table |
-------------------------------+--------------------------------------------
The attached files hold a new N-dimensional
[http://en.wikipedia.org/wiki/Contingency_table contingency table] class
and tests for this class. The standard two-way table is present in almost
every elementary statistics book. However, an adequate model has yet to
be included in scipy.
The class here allows the user to load a table from a numpy array.
Additionally, a contingency table can also be generated by sorting raw
data from columns (1D arrays).
{{{
from scipy import stats
import numpy as np
# Initialize from array
obs = np.array([[10,0], [0, 10]])
ct = stats.ContingencyTable(observed=obs)
# Initialize from columns of data
a = np.array([1, 3, 10])
b = np.array([3, 60, 45])
c = np.array([0.9, 0.99, 0.999])
x = (a,b,c)
def nines(l, u, b):
# Define our own distribution to apply to the data
return 1.0 - np.logspace(np.log10(1.0 - l), np.log10(1.0 - u), b)
bounds = ([1, 10], [0, 2], [0.9, 0.999])
dists = (np.linspace, np.logspace, nines)
ct = stats.ContingencyTable.from_columns(x, shape=(2, 4, 3),
bounds=bounds, distribution=dists, discrete=(False, False, True))
}}}
Slices and summations of the contingency table are handled by class
methods. Additionally, important metrics, such as chi-squared value,
entropy, and mutual information, are also defined as methods.
Note that one of the strengths of this implementation is that metrics of
lower dimensionality can always be calculated using a combination of the
above methods. For example, using the 'from column' table above
{{{
# Entropy of original table, H(a,b,c)
h = ct.entropy
# Entropy of just the a and c axes, H(a, c)
ct1 = ct.collapse_shape((False, True, False))
h1 = ct1.entropy()
}}}
Some related tickets are [http://projects.scipy.org/scipy/ticket/893 893],
[http://projects.scipy.org/scipy/ticket/956 956], and
[http://projects.scipy.org/scipy/ticket/1203 1203].
Lastly, Warren Weckesser deserves a big thanks for working on this with me
during the !SciPy 2010 statistics sprint.
--
Ticket URL: <http://projects.scipy.org/scipy/ticket/1258>
SciPy <http://www.scipy.org>
SciPy is open-source software for mathematics, science, and engineering.
More information about the Scipy-tickets
mailing list