[SciPy-Dev] Deprecate stats.glm?
Thu Jun 3 09:49:21 CDT 2010
On Thu, Jun 3, 2010 at 10:18 AM, Warren Weckesser
> firstname.lastname@example.org wrote:
>> On Thu, Jun 3, 2010 at 8:50 AM, Warren Weckesser
>> <email@example.com> wrote:
>>> stats.glm looks like it was started and then abandoned without being
>>> finished. It was last touched in November 2007. Should this function
>>> be deprecated so it can eventually be removed?
>> My thoughts when I looked at it was roughly:
>> leave it alone since it's working, but don't "advertise" it because we
>> should get a better replacement.
> How does one not advertise it?
> The docstring is wrong, incomplete, and not useful.
That's it's not advertised
> It has no tests.
It has no tests (except for examples on my computer), but the results
(for the basic case that I looked at) are correct.
If we increase test coverage or start removing functions that don't
have tests yet, I would work on box-cox, and several other functions
in morestats.py . Mainly a question of priorities.
> Currently, it appears that it just duplicates ttest_ind. As far as I
> know, no one is working on it.
> Leaving it in wastes users' time reading about it. It erodes confidence
> in other functions in scipy: "Is foo() a good function, or has it been
> abandoned, like glm()?"
> To me, it is an ideal candidate for removal.
If we apply strict criteria along those lines, we can reduce the size
of scipy.stats.stats and scipy.stats.morestats, I guess, by at least a
third. (Which I would do if I could start from scratch).
A big fraction of functions in scipy.stats are in the category "no one
is working on it".
For glm specifically, I don't see any big cost of leaving it in, nor
for deprecating it, and then I usually stick to the status-quo. But
you can as well deprecate it, and point to ttest_ind.
And for "bigger fish" like pdfmoments and pdf_approx, I never received
a reply or opinion on the mailing list.
statsmodels will have (or better, has in the sandbox) a generalization
for glm, that works for any number of groups and includes both t_test
>> similar to linregress the more general version will be available when
>> scipy.stats gets the full OLS model.
>>>>> x = (np.arange(20)>9).astype(int)
>>>>> y = x + np.random.randn(20)
>> (-1.7684287512254859, 0.093933208147769023)
>>>>> stats.ttest_ind(y[:10], y[10:])
>> (-1.7684287512254859, 0.093933208147768926)
>> In the current form it doesn't do much different than ttest_ind except
>> for different argument structure.
>> I think it could be made to work on string labels if _support.unique
>> is replaced by np.unique (which we are doing in statsmodels)
>>>>> x = (np.arange(20)>9).astype(str)
>> array(['F', 'F', 'F', 'F', 'F', 'F', 'F', 'F', 'F', 'F', 'T', 'T', 'T',
>> 'T', 'T', 'T', 'T', 'T', 'T', 'T'],
>> Traceback (most recent call last):
>> File "<pyshell#24>", line 1, in <module>
>> File "C:\Josef\_progs\Subversion\scipy-trunk_after\trunk\dist\scipy-0.8.0.dev6416.win32\Programs\Python25\Lib\site-packages\scipy\stats\stats.py",
>> line 3315, in glm
>> p = _support.unique(para)
>> File "C:\Josef\_progs\Subversion\scipy-trunk_after\trunk\dist\scipy-0.8.0.dev6416.win32\Programs\Python25\Lib\site-packages\scipy\stats\_support.py",
>> line 45, in unique
>> if np.add.reduce(np.equal(uniques,item).flat) == 0:
>> AttributeError: 'NotImplementedType' object has no attribute 'flat'
>>> SciPy-Dev mailing list
>> SciPy-Dev mailing list
> SciPy-Dev mailing list
More information about the SciPy-Dev