[SciPy-Dev] docstring standard: parameter shape description

Joe Harrington jh@physics.ucf....
Mon Jan 28 15:21:56 CST 2013


On Sun, Jan 27, 2013 at 2:51 PM, Ralf Gommers <ralf.gommers@gmail.com> wrote:
> Hi,
>
> When merging the doc wiki edits there were a large number of changes to the
> shape description of parameters/returns. This is not yet described in the
> docstring standard
> (https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt), and
> currently is done in various ways:
>
> param1 : ndarray, shape (N,)

I think it should be consistent between all cases, start with the class
and then the shape, and solve the general problem.

Initially, I agreed with Josef about being terse, but it reads hard that
way and if you're a newbie you might wonder what the numbers in parens
are.  The word "shape" does not add an extra line, and the comma makes
sense as an appositive in English.

So, I prefer:

param1 : ndarray, shape XXXXX

For XXXXX, we need to specify:

ranges of allowed numbers of dimensions
ranges of allowed sizes within each dimension
low- or high-side unconstrained sizes in either case

We should accept the output of .shape, and define some range
conventions.  Of course, there will be pathological cases, particularly
in specialist packages that adopt the numpy doc standard, where nothing
but text will adequately describe the allowed dimensions ("If there are
three dimensions, then the second dimension must...").  A "(see text)"
should be allowed after the shape spec.

So, this is my counterproposal for inclusion in the standard:

-------------------------------------------------------------------------------
param1 : ndarray, shape <shapespec> [(see text)]
as in
param1 : ndarray, shape (2, 2+, dim(any), 4-, 4-6, any) (see text)

in <shapespec>:
  the spec reads from the slowest-varying to the fastest-varying dimension
  a number means exactly that number of items on that axis
  a number followed by a "+" ("-") means that number or more (fewer) items
  a-b means between a and b items, INCLUSIVE
  "any" means any number of items on that axis
  dim(dimspec) means the conventions above apply for dimensions instead of items

The example would mean an array with dimensions, from slowest to
fastest-varying, of size:
2
2 or more
(0 or more axes can be inserted here)
0 to 4
4 to 6
any size, including absent (use 1+ to require a dimension)
-------------------------------------------------------------------------------

I thought of basing the ranges off the Python indexing spec, but I find
it potentially confusing.  Is there a reason to propagate the Python
weirdness that the ending index is 1 more than the final item?  The
latter behavior of Python is useful in programming (you don't have to
write "-1" all the time), but it is error-inducing to many, even to
non-beginners.  However, if we did this, the example would look like:

(2, :3, dim(:), 4:, 4:7, :)

I don't think it wise to use the indexing spec with a changed meaning
for the ending index.  Either we should adopt the indexing spec or we
should adopt another spec that looks different enough from the indexing
spec not to be confusing.

Remember that the docs need to be clear to beginners and help them not
make errors.

Thoughts?

--jh--


More information about the SciPy-Dev mailing list