[SciPy-Dev] Deprecate stats.glm?

Warren Weckesser warren.weckesser@enthought....
Thu Jun 3 10:51:42 CDT 2010


josef.pktd@gmail.com wrote:
> On Thu, Jun 3, 2010 at 10:49 AM,  <josef.pktd@gmail.com> wrote:
>   
>> On Thu, Jun 3, 2010 at 10:18 AM, Warren Weckesser
>> <warren.weckesser@enthought.com> wrote:
>>     
>>> josef.pktd@gmail.com wrote:
>>>       
>>>> On Thu, Jun 3, 2010 at 8:50 AM, Warren Weckesser
>>>> <warren.weckesser@enthought.com> wrote:
>>>>
>>>>         
>>>>> stats.glm looks like it was started and then abandoned without being
>>>>> finished.  It was last touched in November 2007.  Should this function
>>>>> be deprecated so it can eventually be removed?
>>>>>
>>>>>           
>>>> My thoughts when I looked at it was roughly:
>>>> leave it alone since it's working, but don't "advertise" it because we
>>>> should get a better replacement.
>>>>
>>>>         
>>> How does one not advertise it?
>>>
>>> The docstring is wrong, incomplete, and not useful.
>>>       
>> That's it's not advertised
>>
>>     
>>> It has no tests.
>>>       
>> It has no tests (except for examples on my computer), but the results
>> (for the basic case that I looked at) are correct.
>> If we increase test coverage or start removing functions that don't
>> have tests yet, I would work on box-cox, and several other functions
>> in morestats.py . Mainly a question of priorities.
>>
>>     
>>> Currently, it appears that it just duplicates ttest_ind.  As far as I
>>> know, no one is working on it.
>>>
>>> Leaving it in wastes users' time reading about it.  It erodes confidence
>>> in other functions in scipy:  "Is foo() a good function, or has it been
>>> abandoned, like glm()?"
>>>
>>> To me, it is an ideal candidate for removal.
>>>       
>> If we apply strict criteria along those lines, we can reduce the size
>> of scipy.stats.stats and scipy.stats.morestats, I guess, by at least a
>> third. (Which I would do if I could start from scratch).
>> A big fraction of functions in scipy.stats are in the category "no one
>> is working on it".
>>
>> For glm specifically, I don't see any big cost of leaving it in, nor
>> for deprecating it, and then I usually stick to the status-quo. But
>> you can as well deprecate it, and point to ttest_ind.
>>
>> And for "bigger fish" like pdfmoments and pdf_approx, I never received
>> a reply or opinion on the mailing list.
>>
>> statsmodels will have (or better, has in the sandbox) a generalization
>> for glm, that works for any number of groups and includes both t_test
>> and f_test.
>>     
>
> Actually, now that I have to think about glm again, I'm also in favor
> of deprecating it, since I can always point to the general version in
> statsmodels.
>
> Josef
>
>   

Heh... meanwhile I'm starting to think that my call for deprecation was 
premature, and maybe all it really needs is an updated, accurate 
docstring that explains what the current implementation does.  :)

Warren

>
>
>   
>> Josef
>>
>>     
>>> Warren
>>>
>>>       
>>>> similar to linregress the more general version will be available when
>>>> scipy.stats gets the full OLS model.
>>>>
>>>>
>>>>         
>>>>>>> x = (np.arange(20)>9).astype(int)
>>>>>>> y = x + np.random.randn(20)
>>>>>>> stats.glm(y,x)
>>>>>>>
>>>>>>>               
>>>> (-1.7684287512254859, 0.093933208147769023)
>>>>
>>>>         
>>>>>>> stats.ttest_ind(y[:10], y[10:])
>>>>>>>
>>>>>>>               
>>>> (-1.7684287512254859, 0.093933208147768926)
>>>>
>>>> In the current form it doesn't do much different than ttest_ind except
>>>> for different argument structure.
>>>>
>>>> I think it could be made to work on string labels if _support.unique
>>>> is replaced by np.unique (which we are doing in statsmodels)
>>>>
>>>>
>>>>         
>>>>>>> x = (np.arange(20)>9).astype(str)
>>>>>>> x
>>>>>>>
>>>>>>>               
>>>> array(['F', 'F', 'F', 'F', 'F', 'F', 'F', 'F', 'F', 'F', 'T', 'T', 'T',
>>>>        'T', 'T', 'T', 'T', 'T', 'T', 'T'],
>>>>       dtype='|S1')
>>>>
>>>>         
>>>>>>> stats.glm(y,x)
>>>>>>>
>>>>>>>               
>>>> Traceback (most recent call last):
>>>>   File "<pyshell#24>", line 1, in <module>
>>>>     stats.glm(y,x)
>>>>   File "C:\Josef\_progs\Subversion\scipy-trunk_after\trunk\dist\scipy-0.8.0.dev6416.win32\Programs\Python25\Lib\site-packages\scipy\stats\stats.py",
>>>> line 3315, in glm
>>>>     p = _support.unique(para)
>>>>   File "C:\Josef\_progs\Subversion\scipy-trunk_after\trunk\dist\scipy-0.8.0.dev6416.win32\Programs\Python25\Lib\site-packages\scipy\stats\_support.py",
>>>> line 45, in unique
>>>>     if np.add.reduce(np.equal(uniques,item).flat) == 0:
>>>> AttributeError: 'NotImplementedType' object has no attribute 'flat'
>>>>
>>>> Josef
>>>>
>>>>
>>>>         
>>>>> Warren
>>>>>
>>>>> _______________________________________________
>>>>> SciPy-Dev mailing list
>>>>> SciPy-Dev@scipy.org
>>>>> http://mail.scipy.org/mailman/listinfo/scipy-dev
>>>>>
>>>>>
>>>>>           
>>>> _______________________________________________
>>>> SciPy-Dev mailing list
>>>> SciPy-Dev@scipy.org
>>>> http://mail.scipy.org/mailman/listinfo/scipy-dev
>>>>
>>>>         
>>> _______________________________________________
>>> SciPy-Dev mailing list
>>> SciPy-Dev@scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-dev
>>>
>>>       
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>   



More information about the SciPy-Dev mailing list