[SciPy-dev] Guidelines for documenting parameter types
Neil Crighton
neilcrighton@gmail....
Sat Aug 16 09:38:54 CDT 2008
A few of us participating in the doc marathon
(http://sd-2116.dedibox.fr/pydocweb/wiki/Front%20Page/) have some
questions about documenting parameter types, and I thought it would be
good to get others' opinions. If we can agree on some guidelines,
perhaps they could be incorporated into the docstring standard
(http://projects.scipy.org/scipy/numpy/wiki/CodingStyleGuidelines#docstring-standard)?
I don't mind what we end up deciding on, but I think it's a good idea
to address these situations in the guidelines so new people know what
to do, and can feel comfortable about cleaning up someone else's
docstring to match the guidelines (if necessary). Maybe some of these
are pedantic, but I think they'll help to give the docs a more unified
feel and make sure it's always clear what parameter types are meant.
(1) When we mention types in the parameters, we are mostly using the
following abbreviations:
integer : int
float : float
boolean : bool
complex : complex
list : list
tuple : tuple
i.e. the same as the python function names for each type. It would be
nice to say in the guidelines that these should be followed where
possible.
(2) Often it's useful to state the type of an input or returned array.
If we want to say the array returned by np.all is of type bool, what
should we say? Possibilities used so far are
int array
array of int
array of ints
I prefer 'array of ints', because it is also suitable for tuples and
lists ('tuple of ints', or 'list of dtypes'). 'int tuple' is just bad
:) .
(3) Many functions accept either sequences or scalars as input, and
then return arrays if the input was a sequence, or an array scalar if
the input was a scalar. For example:
>>> a = np.sin(np.pi/2)
>>> type(a)
<type 'numpy.float64'>
>>> a = np.sin([np.pi/2,-np.pi/2])
>>> type(a)
<type 'numpy.ndarray'>
There was some discussion about the best way to handle this:
http://sd-2116.dedibox.fr/pydocweb/doc/numpy.core.umath.arcsin/#discussion-sec
http://sd-2116.dedibox.fr/pydocweb/doc/numpy.core.umath.arctan/#discussion-sec
http://sd-2116.dedibox.fr/pydocweb/doc/numpy.core.umath.greater_equal/#discussion-sec
Stefan proposed that for these functions we just refer to the input
parameter type as array_like, and the return type as ndarray, since
these are both described as including scalars in the glossary,
http://sd-2116.dedibox.fr/pydocweb/doc/numpy.doc.reference.glossary/.
I think this is a good rule. (Note that there is at least one proofed
docstring that breaks this rule
http://sd-2116.dedibox.fr/pydocweb/doc/numpy.core.umath.greater/)
(4) Sometimes we need to specify more than one kind of type. For
example, the shape parameter of zeros can be either an int or a
sequence of ints (but is not array_like, since it doesn't accepted
nested sequences). How should we write this? Some possibilities are:
int or sequence of ints
{int, sequence of ints}
I much prefer 'int or sequence of ints' as to me it's clearer and
looks nicer. Also the curly brackets are used when a parameter can
assume one of a set of fixed values (e.g. the kind keyword of argsort,
which can be one of {'quicksort','mergesort','heapsort'}), so I think
it is confusing to also use them in this case.
(5) For keyword arguments, the default value is often None. In this
case we've been omitting None from the parameter types. However,
sometimes None is a valid input type but is not the default (e.g. axis
keyword for argsort). In this case I think it's a good idea to include
None as an explicit parameter.
I've posted to both the scipy-dev and numpy lists - I wasn't sure
which best for this.
Neil
More information about the Scipy-dev
mailing list