[SciPy-Dev] distributions.py

Ralf Gommers ralf.gommers@gmail....
Sat Sep 15 04:59:58 CDT 2012


On Fri, Sep 14, 2012 at 10:56 PM, Jake Vanderplas <
vanderplas@astro.washington.edu> wrote:

>  On 09/14/2012 01:49 PM, Ralf Gommers wrote:
>
>
>
> On Fri, Sep 14, 2012 at 12:48 AM, <josef.pktd@gmail.com> wrote:
>
>> On Thu, Sep 13, 2012 at 5:21 PM, nicky van foreest <vanforeest@gmail.com>
>> wrote:
>> > Hi,
>> >
>> > Now that I understand github (Thanks to Ralf for his explanations in
>> > Dutch) and got some simple stuff out of the way in distributions.py I
>> > would like to tackle a somewhat harder issue. The function argsreduce
>> > is, as far as I can see, too generic. I did some tests to see whether
>> > its most generic output, as described by its docstring, is actually
>> > swallowed by the callers of argsreduce, but this appears not to be the
>> > case.
>>
>>  being generic is not a disadvantage (per se) if it's fast
>>
>> https://github.com/scipy/scipy/commit/4abdc10487d453b56f761598e8e013816b01a665
>> (and a being a one liner is not a disadvantage either)
>>
>> Josef
>>
>> >
>> > My motivation to simplify the code in distributions.py (and clean it
>> > up) is partly based on making it simpler to understand for myself, but
>> > also to  others. The fact that github makes code browsing a much nicer
>> > experience, perhaps more people will take a look at what's under the
>> > hood. But then the code should also be accessible and clean. Are there
>> > any reasons not to pursue this path, and focus on more important
>> > problems of the stats library?
>>
>
> Not sure that argsreduce is the best place to start (see Josef's reply),
> but there should be things that can be done to make the code easier to
> read. For example, this code is used in ~10 methods of rv_continuous:
>
>         loc,scale=map(kwds.get,['loc','scale'])
>         args, loc, scale = self._fix_loc_scale(args, loc, scale)
>         x,loc,scale = map(asarray,(x,loc,scale))
>         args = tuple(map(asarray,args))
>
> Some refactoring may be in order. The same is true of the rest of the
> implementation of many of those methods. Some are exactly the same except
> for calls to the corresponding underscored method (example: logsf() and
> logcdf() are identical except for calls to _logsf() and _logcdf(), and one
> nonsensical multiplication).
>
> Ralf
>
>
>
> _______________________________________________
> SciPy-Dev mailing listSciPy-Dev@scipy.orghttp://mail.scipy.org/mailman/listinfo/scipy-dev
>
>  I would say that the most important improvement needed in distributions
> is in the documentation.
>
A new user would look at the doc string of, say, scipy.stats.norm, and have
> no idea how to proceed.  Here's the current example from the docstring of
> scipy.stats.norm:
>
> Examples
> --------
> >>> from scipy.stats import norm
> >>> numargs = norm.numargs
> >>> [  ] = [0.9,] * numargs
> >>> rv = norm()
>
> >>> x = np.linspace(0, np.minimum(rv.dist.b, 3))
> >>> h = plt.plot(x, rv.pdf(x))
>
> I don't even know what that means... and it doesn't compile.  Also, what
> is b?  how would I enter mu and sigma to make a normal distribution?  It's
> all pretty opaque.
>

True, the examples are confusing. The reason is that they're generated from
a template, and it's pretty much impossible to get clear and concise
examples that way. It would be better to write custom examples for the
most-used distributions, and refer to those from the others.

Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/scipy-dev/attachments/20120915/3990761c/attachment.html 


More information about the SciPy-Dev mailing list