[SciPy-user] Docstring standards for NumPy and SciPy

Travis Oliphant oliphant at ee.byu.edu
Tue Jan 16 17:33:34 CST 2007

Edward Loper wrote:

>[I sent this 5 days ago, but it's been held because I was not
>subscribed -- so I decided to just go ahead & subscribe and resend
>
>
>it.  Apologies if it ends up being a dup.]
>
>
I'm ccing to the users list, but the discussion has been taking place on
the developers list, so I'm addressing it there.

>I'm glad to hear that you're making a push towards using standardized
>markup in docstrings -- I think this is a worthy goal.  I wanted to
>respond to a few points that have come up, though.
>
>First, I'd pretty strongly recommend against inventing your own
>markup language.  It increases the barrier for contributions, makes
>life more difficult for tools, and takes up that much more brain
>space that could be devoted to better things.
>
I'm not really convinced by this argument.   I agree we shouldn't be
flippant about introducing new markup and that is not what I've
proposed.   But, already we must learn multiple markup.  For example,
Moin Moin uses one way to describe tables and restructured text.
Basically, I've said that the existing markups do not seem geared toward
mathematical description (as latex must be hacked on to them).  In
addition, I don't like the look of existing markup docstrings ---
especially in the parameter section.  That's where my biggest problem
really lies.   What should be clear ends up with all kinds of
un-necessary back-ticks.

I also don't really like the extra ":" at the beginning of a paragraph
to denote a section.  I could live with the underline though.

In the end, none of the markup languages seem to have been designed with
the scientific user community in mind and so I'm not feeling a
particular need to cram my brain into what they came up with.

sort of markup.  Why should they all be changed (in essentially
unnecessary ways because a computer program could be written to change
them and they will look "messier" in the end) just to satisfy a
particular markup language that was designed without inquiring about our
needs.

This is not a case of Not Invented Here.  I'm very happy to use an
existing markup standard.  In fact, I basically like restructured Text.
There are only a few issues I have with it.  Ideally, we can get those
issues resolved in epydoc itself.  So that we can use a slightly
modified form of restructured Text in the docstrings.

I'm all for simplifying things, but there is a limit to how much I'm
willing to give up when we can easily automate the conversions to what
epydoc currently expects.

>Plus, it's
>surprisingly hard to do right, even if you're translating from your
>markup to an existing one -- there are just too many corner cases to
>consider.  I know Travis has reservations about the amount of 'line
>noise,' but believe me, there are good reasons why that 'line noise'
>is there, and the authors of ReST have done a *very* good job at
>keeping it to a minimum.
>
>
Well that is debatable in the specific instances of parameter
descriptions.  The extra back-ticks are annoying to say the least, and
un-necessary.

>Given the expressive power that's needed for scipy docs, I would
>recommend using ReST.  Epytext is a much simpler markup language, and
>most likely won't be expressive enough.  (e.g., it has no support for
>tables.)
>
>Whatever markup language you settle on, be sure to indicate it by
>setting module-level __docformat__ variables, as described in PEP
>258.  __docformat__ should be a string containing the name of the
>module's markup language. The name of the markup language may
>optionally be followed by a language code (such as en for English).
>Conventionally, the definition of the __docformat__ variable
>immediately follows the module's docstring.  E.g.:
>
>   __docformat__ = 'restructuredtext'
>
>Other standard values include 'plaintext' and 'epytext'.
>
>
SciPy is big enough that I see no reason we cannot define a slightly
modified form of restructured Text (i.e. it uses MoinMoin tables, gets
rid of back-ticks in parameter lists, understands math (), and has
specific layout for certain sections.

>As for extending ReST and/or epydoc to support any specializiations
>you want to make, I don't think it'll be that hard.  E.g., adding
>'input' and 'output' as aliases for 'parameters' and 'returns' is
>pretty simple.  And adding support for generating latex-math should
>be pretty straight-forward.  I think concerns about the markup for
>marking latex-math are perhaps exaggerated, given that the *contents*
>of latex-math expressions are quite likely to look like line-noise to
>the uninitiated. :)  I've patched my local version of docutils to
>support inline math with x=12:math: and block math with:
>
>.. math:: F(x,y;w) = \langle w, \Phi(x,y) \rangle
>
>And I've been pretty happy with how well it reads.  And for people
>who aren't latex gurus, it may be more obvious what's going on if
>they see :math:..big latex expr.. than if they just see $..big >latex expr..$.
>
>If you really think that's too line-noise-like, then you could set
>the default role to be math, so x=12 would render as math.  But
>then you'd need to explicitly mark crossreferences, so I doubt that
>would be a win overall.
>
>[Alan Isaac]
>
>
>
>>Must items (e.g., parameters) in a consolidated field be
>>marked as interpreted text (with back ticks).
>>    Yes.  It does seem redundant, so I will ask why.
>>
>>
>>
>
>I wouldn't mind changing this to work both with & without the
>backticks around parameter names.  At the time when I implemented it,
>I just checked what the standard practice within docutils for writing
>consolidated fields was, and wrote a parser for that.
>
>
Allowing us not to have backticks in parameter names would help me like
using restructured Text quite a bit.

I see no reason why parameter lists cannot be handled specially.  After
all, it is the most important part of a docstring.

>
>
>>Is table support adequate in reST?
>>
>>
>>
>
>See <http://docutils.sourceforge.net/docs/ref/rst/
>restructuredtext.html#tables>
>
>If ReST table support isn't expressive enough for you, then you must
>be using some pretty complex tables. :)
>
>
Moin Moin uses a different way to describe tables.   :-(

>[Alan Isaac]
>
>
>
>>    math, so we could inline f(x)=x^2 rather than
>>    :latex-math:f(x)=x^2.
>>
>>
>>
>
>As I noted above, this would mean you'd have to explicitly mark
>crossreferences to python objects with some tag -- rst can't read
>your mind to know whether foo refers to a math expression or a
>variable.
>
>
>
>
>
>>It may be worth asking whether
>>    epydoc developers would be willing to pass $f(x)=x^2$
>>    as latex-math.
>>
>>
>>
>
>Overall, I'm reluctant to make changes to the markup language(s)
>themselves that aren't supported by the markup language's own
>extension facilities.
>
>
>
That understandable reluctance is why we need to make changes to the
standard for SciPy docstrings.  Math support is critical and it just
isn't built-in to restructured Text as well as it could be.  Having to
do :latex-math: <expr>  for in-line math is silly when $<expr>$ has
been the way to define latex math for a long time.

In summary, my biggest issues with just straight restructured Text are

1) back-ticks in parameter lists.
2) the way math is included
3) doesn't understand Moin Moin tables
4) doesn't seem to have special processing for standard section headers

I also don't really like the way bold, italics, and courier are
handled.  My favorite is now *bold* /italics/ and  fixed-width.

I like the {{{ code }}} for code blocks that Moin Moin uses, but that's
not a big deal to me.  I can live with the :: that restructured-Text uses.

It seems like we are going to have to customize (hack) epydoc to do what
we want anyway.  Why then can we not "tweak" reST a little-bit too.

-Travis