# [SciPy-Dev] distributions.py

nicky van foreest vanforeest@gmail....
Sun Sep 16 14:28:42 CDT 2012

Hi Josef,

> I think splitting into continuous and discrete is helpful.
>
> But I don't like splitting off the distributions, 90 files for
> distributions with 10 to 20 lines of real code each sounds a lot of
> files when we need to look for anything.
>
> Actually, I find the large file easy to use, using a search string,
> and it makes it easy to compare across distributions. Finding the
> generic parts can be difficult.

Ok. This makes sense to me too.

>
> Josef
>
> Each docstring can/should contain some generic
>> part (like now) and a specific part, with working examples, and clear
>> explanations. The most important are normal, expon, binom, geom,
>> poisson, and perhaps some others. This would also enable others to
>> help extend the documentation, examples....
>> 4) I would like to move the math parts in continuous.rst to the doc
>> string in the related distribution file.  Since mathjax gives such
>> nice results on screen, there is also no reason not to include the
>> mathematical facts in the doc string of the distribution itself. In
>> fact, most (all?) distributions already have a short math description,
>> but this is in overlap with continuous.rst.
>
> The main distinction for scipy usually is that docstrings should be
> readable in the interpreter as informative strings without being heavy
> on latex, while tutorial, and so on are mainly targeted to html.

I forgot about reading the docstrings in ipython for instance. You're right.

Nicky

>
> Josef
>
>>
>> I wouldn't mind chopping up distributions.py into the separate
>> distributions, and merge it with the maths of continuous.rst. I can
>> tackle approx one distribution per day roughly, hence reduce this
>> mind-numbing work to roughly 15 minutes a day (correction work on
>> exams is much worse :-) ). But I don't know how much this proposal
>> will affect the automatic generation of documentation. For the rest I
>> don't think this will affect the code a lot.
>>
>>
>>
>> NIcky
>>
>>
>>
>>
>>
>> On 15 September 2012 11:59, Ralf Gommers <ralf.gommers@gmail.com> wrote:
>>>
>>>
>>> On Fri, Sep 14, 2012 at 10:56 PM, Jake Vanderplas
>>> <vanderplas@astro.washington.edu> wrote:
>>>>
>>>> On 09/14/2012 01:49 PM, Ralf Gommers wrote:
>>>>
>>>>
>>>>
>>>> On Fri, Sep 14, 2012 at 12:48 AM, <josef.pktd@gmail.com> wrote:
>>>>>
>>>>> On Thu, Sep 13, 2012 at 5:21 PM, nicky van foreest <vanforeest@gmail.com>
>>>>> wrote:
>>>>> > Hi,
>>>>> >
>>>>> > Now that I understand github (Thanks to Ralf for his explanations in
>>>>> > Dutch) and got some simple stuff out of the way in distributions.py I
>>>>> > would like to tackle a somewhat harder issue. The function argsreduce
>>>>> > is, as far as I can see, too generic. I did some tests to see whether
>>>>> > its most generic output, as described by its docstring, is actually
>>>>> > swallowed by the callers of argsreduce, but this appears not to be the
>>>>> > case.
>>>>>
>>>>> being generic is not a disadvantage (per se) if it's fast
>>>>>
>>>>> https://github.com/scipy/scipy/commit/4abdc10487d453b56f761598e8e013816b01a665
>>>>> (and a being a one liner is not a disadvantage either)
>>>>>
>>>>> Josef
>>>>>
>>>>> >
>>>>> > My motivation to simplify the code in distributions.py (and clean it
>>>>> > up) is partly based on making it simpler to understand for myself, but
>>>>> > also to  others. The fact that github makes code browsing a much nicer
>>>>> > experience, perhaps more people will take a look at what's under the
>>>>> > hood. But then the code should also be accessible and clean. Are there
>>>>> > any reasons not to pursue this path, and focus on more important
>>>>> > problems of the stats library?
>>>>
>>>>
>>>> Not sure that argsreduce is the best place to start (see Josef's reply),
>>>> but there should be things that can be done to make the code easier to read.
>>>> For example, this code is used in ~10 methods of rv_continuous:
>>>>
>>>>         loc,scale=map(kwds.get,['loc','scale'])
>>>>         args, loc, scale = self._fix_loc_scale(args, loc, scale)
>>>>         x,loc,scale = map(asarray,(x,loc,scale))
>>>>         args = tuple(map(asarray,args))
>>>>
>>>> Some refactoring may be in order. The same is true of the rest of the
>>>> implementation of many of those methods. Some are exactly the same except
>>>> for calls to the corresponding underscored method (example: logsf() and
>>>> logcdf() are identical except for calls to _logsf() and _logcdf(), and one
>>>> nonsensical multiplication).
>>>>
>>>> Ralf
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> SciPy-Dev mailing list
>>>> SciPy-Dev@scipy.org
>>>> http://mail.scipy.org/mailman/listinfo/scipy-dev
>>>>
>>>> I would say that the most important improvement needed in distributions is
>>>> in the documentation.
>>>>
>>>> A new user would look at the doc string of, say, scipy.stats.norm, and
>>>> have no idea how to proceed.  Here's the current example from the docstring
>>>> of scipy.stats.norm:
>>>>
>>>> Examples
>>>> --------
>>>> >>> from scipy.stats import norm
>>>> >>> numargs = norm.numargs
>>>> >>> [  ] = [0.9,] * numargs
>>>> >>> rv = norm()
>>>>
>>>> >>> x = np.linspace(0, np.minimum(rv.dist.b, 3))
>>>> >>> h = plt.plot(x, rv.pdf(x))
>>>>
>>>> I don't even know what that means... and it doesn't compile.  Also, what
>>>> is b?  how would I enter mu and sigma to make a normal distribution?  It's
>>>> all pretty opaque.
>>>
>>>
>>> True, the examples are confusing. The reason is that they're generated from
>>> a template, and it's pretty much impossible to get clear and concise
>>> examples that way. It would be better to write custom examples for the
>>> most-used distributions, and refer to those from the others.
>>>
>>> Ralf
>>>
>>>
>>>
>>> _______________________________________________
>>> SciPy-Dev mailing list
>>> SciPy-Dev@scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-dev
>>>
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev@scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-dev
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev