[Numpy-discussion] A disconnected numarray rant
Alexander Schmolck
a.schmolck at gmx.net
Tue Oct 12 02:40:55 CDT 2004
Hi,
I'm taking a 1 month break from computers (i.e. I will be completely
off-line), and I have to catch a train in an hour; but I've recently bitten
the bullet and made a matrix class I've been using for some time work with
numarray; I've written down a number of things that occured to me while I was
doing it, including some things which I think are bugs in numarray, so I
thought at least posting the bugs would be a useful service; the rest is very
raw and essentially unedited cut-and-paste of these notes -- sorry about that
and I hope it doesn't contain anything particularly offensive.
P.S. just dumped the code for the matrix class (nummat) at
http://www.dcs.ex.ac.uk/~aschmolc/Stuff/
'as
The following are my notes:
Things that fairly clearly seem to be bugs:
- numarray.Int32 etc. can't be pickled
- ``a = array(1+0j); a.imag = a.real * 10`` => IndexError
- array(0, type=Float64) + 1e3000 => `inf` with right error modes
but array(0, type=Float32) + 1e3000 => `OverflowError`
- numarray.array(10)/numarray.array(0) => 0
- numarray.array(10000000000000L) => array(1316134912)
- numarray.where(0,1,0) => array([0])
- l = [1,2,3]; numarray.put(l,numarray.array([1,2,0]),[0,0,0]); l => [1, 2, 3]
a = array([1,2,3]); numarray.put(a,numarray.array([1,2,0]),[0,0,0]); a => array([0, 0, 0])
- repr(numarray.array([],typecode='i')) (etc. etc.) => "numarray.array([])"
- getattr(array([1,2,3]), '_aligned') => SystemError
- obscure: numarray.where(0, matrix(568, convert_scalars=True),2) =>
ValueError (tries __len__ which fails, as len(array(568)) also fails)
Numeric incompatiblilities (that are either undocumented or bug-like)
- numarray.array('a', typecode='O') => TypeError (object arrays)
- for extra fun try: numarray.array(1, type=numarray.Object) -=> RuntimeError
something entirely different
- nonzero is completely incompatible
- shape(None) etc. no longer works (IMHO a bug)
- cross_correlate & average missing
- left_shift et al missing
- numarray.sqrt(a,a) is None (*not* the result, as it used to be)
- num.put(a, [0,1,2,3], [10,20]) style behavior seems unavailable (without numarray.numeric)
put(array([[ 0., 1., 2.], [ 3., 4., 5.]]), [1, 4], [10,40]) fails
- boolean testing (not even bool(array(0)) works; I'm not sure this is good)
- Generally different handling of rank0-arrays; e.g. ``type(num.array(1.0) +
0) is float``; one potentially very nasty gotcha are inplace operations
(e.g. a**=2) which have totally different semantics for python scalars and
rank0 arrays, which, unlike Attribute errors on ``a.shape``, can lead to
nasty bugs in corner cases (e.g. when a reduction just infrequently yields
scalar ``a``) -- I think this should be mentioned in a gotchas section
(another possible entry would be the need to use .copy() to **save** memory
on slicing and 1xN, Nx1 matrices versus vectors (people are not used to
thinking properly about rank from mathematical training or matlab
exposure)).
- asarray downcasts arrays (e.g.: asarray(array([1.,2.,3.]),'i'))
- numarray.ones(-5) => MemoryError (ValueError would be nicer)
- numarray.ones(2.0), numarray.ones([2]) fail (cf. numarray.range(2.0))
b=num.array([[1,2,3,4],[5,6,7,8]]*2)
assert eq(num.diagonal(b), [1,6,3,8])
assert eq(num.diagonal(b, -1), [5,2,7])
c = num.array([b,b])
assert eq(num.diagonal(c,1), [[2,7,4], [2,7,4]])
- no a.toscalar() !!!
- matrixmultiply in the docs
- what's the point of swapaxes (i.e. why not have a generalized in-place
transpose?)
- what's the point of innerproduct?
- indexing by a list is different from indexing by tuple (I haven't had time
to look closely at the docs whether that's intentional)
- doesn't know about Numeric's bizzarre '\x0b' typecode
- numarray.sqrt.reduce([]) raises (sensibly) TypeError, not ValueError
- len(array(1)) or array(1)[0] won't work anymore (understandable, but
should be documented)
- (should maximim, minimum reduce to -inf and inf?)
- <built-in method reduce of _BinaryUFunc object at 0x82dfc9c> is not
a very helpful repr; should be possible to get to the ufunc itself
- as in Numeric numarray.maximum.reduce(numarray.array([0,-0.])) => -0.0
- __array__ protocol no longer supported (how can a non-derived class convert
itself efficiently to an array?)
Documentation Gotchas
- p. 34 IMO row vector is used incorrectly; row and column vectors are really
matrices (i.e. have rank 2) so ``array([[1,2,3]])`` would be a row vector
- No proper explanation of differences between Numeric and numarray, or
numarray.numeric module differences to proper (e.g. argmin)
- No migration and best-practice advice (e.g. there should be a standard way
for packages which work with both numarray and numeric as backends to let
the user choose his preference; how about setting an environment var NumPy
or something?)
Waffle
------
- there *really* ought to be an array equality function (with optional
tolerance); it's quite difficult to get right for are normal user (nans;
zero-size arrays etc.) and it's often required, especially for testing
- rank preserving reduction seems useful as an option would be nice -- e.g. to
subtract out or divide by the reduced portion (which currently won't e.g.
work for columns without adding a unit-dimension by hand).
Design
The (AFAICS) benefit-free but downside-rich introduction of `type`
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
Is there any reason that Typecode objects that compare as desired to the
relevant strings ("i", "d") wouldn't have done? Now there is an explosion
and confusion of interfaces -- some numpy code will now only except
type(code)s as "typecode" keyword parameter (even in numarray! see
numarray.mlab!) and other stuff
Never mind that type already is a highly overused word in the python world.
The big method bloat.
'''''''''''''''''''''
As it says in the Numeric manual introductions there were "good reasons" for
"very few array methods" -- now there are **56** public methods and 8 public
attributes (public == not starting with '_'); of those 56 methods about 11
are accessors and of the rest about half are redundant or worse (i.e. they
either also exist as numarray functions (argmin, argmax, diagonal, ...) or
they really ought to be functions (mean, stddev) or they are quite confusing
(``a.min``, ``a.max`` which behave quite differenlty from ``a.argmin`` and
``a.argmax``, never mind ``numarray.minimum``) or simply utterly pointless
(``a.nelements`` == ``a.size``)).
- argmin, argmax : what's wrong with numarray.argmin, numarray.argmax??? Why
do argmin/argmax and max/min have completely different interfaces??? If
there really is a need for these (there isn't) anything a.min and a.max
should be called a.flatmin, a.flatmax
- diagonal, mean, nelements, nonzero, ...
- perversely the **only** function that I can think off that could have
sensibly become a method hasn't: ``put`` (it used to work only on arrays
under Numeric and not without reason, so making it a method would have
been sensible; numarray.put of course also "works" on non-arrays, it just
doesn't do anything with them)
Test Code
'''''''''
numtest.py doesn't inspire full confidence (it's about 1000 lines of actual
code but it doesn't seem that clearly structured and AFAICT contains no
single loop (and that despite the diversity of shapes, types etc. that exist
in numarray -- why not try something slightly more systematic?)).
More information about the Numpy-discussion
mailing list