[Numpy-discussion] generalized_inverse

David M. Cooke cookedm at physics.mcmaster.ca
Sun Jul 16 03:26:49 CDT 2006


On Jul 16, 2006, at 00:21 , Travis Oliphant wrote:

> Victoria G. Laidler wrote:
>> Jonathan Taylor wrote:
>>
>>> pseudoinverse
>>>
>>> it's the same name matlab uses:
>>>
>>> http://www.mathworks.com/access/helpdesk/help/techdoc/ref/pinv.html
>>>
>>
>> Thanks for the explanation.
>>
>> I'm puzzled by the naming choice, however. Standard best practice in
>> writing software is to give understandable names, to improve  
>> readability
>> and code maintenance. Obscure abbreviations like "pinv" pretty  
>> much went
>> out with the FORTRAN 9-character limit for variable names. It's very
>> unusual to see them in new software nowadays, and it always looks
>> unprofessional to me.
>>
> I appreciate this feedback.  It's a question that comes up  
> occasionally,
> so I'll at least give my opinion on the matter which may shed some  
> light
> on it.
>
> I disagree with the general "long-name" concept when it comes to
> "very-common" operations.    It's easy to take an idea and
> over-generalize it for the sake of consistency.   I've seen too many
> codes where very long names actually get in the way of code  
> readability.

How are pseudoinverse and inverse "very common"? (Especially given  
that one of the arguments for not having a .I attribute for inverse  
on matrices is that that's usually the wrong way to go about solving  
equations.)

> Someone reading code will have to know what an operation actually  
> is to
> understand it.   A name like "generalized_inverse" doesn't convey any
> intrinsic meaning to the non-practitioner anyway.  You always have to
> "know" what the function is "really" doing.  All that's needed is a
> "unique" name.  I've found that long names are harder to remember
> (there's more opportunity for confusion about how much of the full  
> name
> was actually used and how any words were combined).

As has been argued before, short names have their own problems with  
remembering what they are. I also find that when reading code with  
short names, I go slower, because I have to stop and think what that  
short name is (particularly bad are short names that drop vowels,  
like lstsq -- I can't pronounce that!). I'm not very good at creating  
hash tables in my head from short names to long ones.


The currently exported names in numpy.linalg are solve, inv,  
cholesky, eigvals, eigvalsh, eig, eigh, svd, pinv, det, lstsq, and  
norm. Of these, 'lstsq' is the worst offender, IMHO (superfluous  
dropped vowels). 'inv' and 'pinv' are the next, then the 'eig*' names.

'least_squares' would be better than 'lstsq'.

'inverse' is not much longer than 'inv', and is more descriptive. I  
don't think 'pinv' is that common to need a short name;  
'pseudoinverse' would be better (not all generalized inverses are  
pseudoinverses).

Give me these three and I'll be happy :-)

Personally, I'd prefer 'eigenvalues' and 'eigen' instead of 'eigvals'  
and 'eig', but I can live with the current names.

'det' is fine, as it's used in mathematical notation. 'cholesky' is  
also fine, as it's a word at least. I'd have to look at the docstring  
to find how to use it, but that would be the same for  
"cholesky_decomposition".

[btw, I'm ok with numpy.dft now: the names there make sense, because  
they're constructed logically. Once you know the scheme, you can see  
right away that 'irfftn' is 'inverse real FFT, n-dimensional'.]

>
> A particularly ludicrous case, for example, was the fact that the very
> common SVD (whose acronym everybody doing linear algebra uses) was  
> named
> in LinearAlgebra (an unecessarily long module name to begin with) with
> the horribly long and unsightly name of singular_value_decomposition.
> I suppose this was done just for the sake of "code readability."

I agree; that's stupid.

> It's not that we're concerned with MATLAB compatibility.  But, frankly
> I've never heard that the short names MATLAB uses for some very common
> operations are a liability.   So, when a common operation has a short,
> easily-remembered name that is in common usage, why not use it?
>
> That's basically the underlying philosophy.   NumPy has too many very
> basic operations to try and create very_long_names for them.
>
> I know there are differing opinions out there.   I can understand  
> that.
> That's why I suspect that many codes I will want to use will be  
> written
> with easy_to_understand_but_very_long names and I'll grin and bear the
> extra horizontal space that it takes up in my code.

-- 
|>|\/|<
/------------------------------------------------------------------\
|David M. Cooke              http://arbutus.physics.mcmaster.ca/dmc/
|cookedm at physics.mcmaster.ca





More information about the Numpy-discussion mailing list