[Scipy-tickets] [SciPy] #1530: Cauchy fit returns nothing in scipy 0.9.0

SciPy Trac scipy-tickets@scipy....
Sat Oct 8 08:02:32 CDT 2011


#1530: Cauchy fit returns nothing in scipy 0.9.0
------------------------------+---------------------------------------------
 Reporter:  gtg944q           |       Owner:  somebody   
     Type:  defect            |      Status:  new        
 Priority:  normal            |   Milestone:  Unscheduled
Component:  scipy.stats       |     Version:  0.9.0      
 Keywords:  cauchy fit stats  |  
------------------------------+---------------------------------------------

Comment(by josefpktd):

 Before the default starting values where ones (or 0,1 for loc and scale),
 now the starting values are

 {{{
     # return starting point for fit (shape arguments + loc + scale)
     def _fitstart(self, data, args=None):
         if args is None:
             args = (1.0,)*self.numargs
         return args + self.fit_loc_scale(data, *args)
 }}}

 Neither of the two works for all distributions. It just shifts the
 problem, some distribution have better starting values now, some
 distributions have worse or don't work.

 I think complete backwards compatibility for successful fit will be
 difficult to guarantee as for any non-linear fitting problems.

 The important change to improve this, is, that Travis copied the idea from
 my statsmodels version to make _fitstart a method that can be overwritten
 by the individual distributions (undocumented and unused feature, I
 guess):

 {{{
 >>> stats.cauchy._fitstart = lambda x:[0,1]
 >>> stats.cauchy.fit(x)
 Traceback (most recent call last):
   File "<pyshell#11>", line 1, in <module>
     stats.cauchy.fit(x)
   File "C:\Python26\lib\site-packages\scipy\stats\distributions.py", line
 1706, in fit
     args += start[Narg:-2]
 TypeError: can only concatenate tuple (not "list") to tuple

 #oops
 >>> stats.cauchy._fitstart = lambda x:(0,1)
 >>> stats.cauchy.fit(x)
 (-0.037067496522282466, 1.0507200452076515)
 }}}

 In my version of fit in statsmodels, I was working on getting
 distribution- or distribution category specific _fitstart, but they didn't
 get into scipy.stats.

 as aside: I don't share Travis's preference for flexible number of
 arguments and using *args and **kwds for what should be just an array_like
 of parameters.

 So the backwards compatible fix for cauchy should be to define

 {{{
 def _fitstart(data, args=None):
    return (0, 1)
 }}}

 or some improved version, median, interquartile range ?

 Note: (it's getting longer than planned)

 The problem with using fit_loc_scale for cauchy, is that cauchy
 distribution (t-distribution with df=1) does not have finite moments.

 {{{
 >>> stats.cauchy.stats()
 (array(inf), array(inf))
 }}}

 The same might be true for some other distributions, but t-distribution
 works, and has estimated parameters very close to cauchy estimate

 {{{
 >>> stats.t.fit(x)
 (0.99060288069029445, -0.03685556144036313, 1.0463505820303065)
 >>> stats.t.stats(1)
 (array(0.0), array(inf))
 }}}


 f doesn't

 {{{
 >>> stats.f.stats(5,1)
 (array(inf), array(inf))

 >>> stats.f.fit(x**2)
 (1.0, 1.0, nan, 0.0)
 >>> stats.f.fit(x**2, loc=0, scale=1)
 (1.1150015822059429, 0.95420973368226625, 0.00024824944525701488,
 0.97673790880522293)
 }}}

-- 
Ticket URL: <http://projects.scipy.org/scipy/ticket/1530#comment:3>
SciPy <http://www.scipy.org>
SciPy is open-source software for mathematics, science, and engineering.


More information about the Scipy-tickets mailing list