[Scipy-tickets] [SciPy] #1675: fit function for generalized extreme value distribution goes wrong.

SciPy Trac scipy-tickets@scipy....
Mon Jun 18 03:00:13 CDT 2012


#1675: fit function for generalized extreme value distribution goes wrong.
-------------------------+--------------------------------------------------
 Reporter:  xydarcher    |       Owner:  somebody   
     Type:  defect       |      Status:  new        
 Priority:  normal       |   Milestone:  Unscheduled
Component:  scipy.stats  |     Version:  0.10.0     
 Keywords:               |  
-------------------------+--------------------------------------------------

Comment(by pbrod):

 Calling the fit function returns
 {{{
 In [71]: stats.genextreme.fit(rvs)
 (1.0, 10.704853052872256, 1.4204665992513283)
 }}}

 which is the same as the default starting values for the fit:
 {{{
 In [72]: stats.genextreme._fitstart(rvs)
 Out[72]: (1.0, 10.733011710390119, 1.5043682956094813)
 }}}

 Thus the reason for this error is that the starting values for the fit
 defines the range (stats.genextreme.a,stats.genextreme.b) which does not
 cover the range of the rvs data. When this is the case the
 stats.genextreme.nnlf function will always return inf and the optimization
 in the stats.genextreme.fit will end and return the default start values
 for the fit.

 One solution to this problem is to give a finite (instead of a infinite)
 penalty to all data-values outside the valid range (a, b) for the
 distribution in the stats.rv_continuous.nnlf function. Then the
 optimization routine will be able to find the better values for the fit.
 Replacing the current nnlf with:
 {{{
     def nnlf(self, theta, x):
         try:
             loc = theta[-2]
             scale = theta[-1]
             args = tuple(theta[:-2])
         except IndexError:
             raise ValueError("Not enough input arguments.")
         if not self._argcheck(*args) or scale <= 0:
             return inf
         x = arr((x-loc) / scale)
         cond0 = (x <= self.a) | (self.b <= x)
         if (any(cond0)):
             # old call: return inf
             goodargs = argsreduce(1 - cond0, *((x,)))
             goodargs = tuple(goodargs + list(args))
             N = len(x)
             Nbad = sum(cond0)
             xmax = np.finfo(float).machar.xmax
             return self._nnlf(*goodargs) + N * (log(scale) + Nbad *
 100.0*log(xmax))
         else:
             N = len(x)
             return self._nnlf(x, *args) + N*log(scale)
 }}}

 one will get the wanted bahaviour:
 {{{
 In [73]: stats.genextreme.fit(rvs)
 Out[73]: (-0.085280908129346508, 10.061299318003471, 1.0041718490648388)
 }}}

-- 
Ticket URL: <http://projects.scipy.org/scipy/ticket/1675#comment:4>
SciPy <http://www.scipy.org>
SciPy is open-source software for mathematics, science, and engineering.


More information about the Scipy-tickets mailing list