[Numpy-discussion] Rank-0 arrays - reprise
Nathaniel Smith
njs@pobox....
Sun Jan 6 10:52:13 CST 2013
On Sun, Jan 6, 2013 at 10:35 AM, Dag Sverre Seljebotn
<d.s.seljebotn@astro.uio.no> wrote:
> I should have been more precise: I like the proposal, but also believe
> the additional complexity introduced have significant costs that must be
> considered.
>
> a) Making += behave differently for readonly arrays should be
> carefully considered. If I have a 10 GB read-only array, I prefer an
> error to a copy for +=. (One could use an ISSCALAR flag instead that
> only affected +=...)
Yes, definitely we would need to nail down the exact semantics here.
My feeling is that we should see start by seeing if we can come up
with a set of coherent rules for read-only arrays that does what we
want before we add an ACT_LIKE_OLD_SCALARS flag, but either way is
viable. (Or we could start with a PRETEND_TO_BE_SCALAR flag and then
gradually migrate away from it.)
> b) Things seems simpler since "indexing away the last index" is no
> longer a special case, it is always true for a.ndim > 0 that "a[i]" is a
> new array such that
>
> a[i].ndim == a.ndim - 1
>
> But in exchange, a new special-case is introduced since READONLY is only
> set when ndim becomes 0, so it doesn't really help with the learning
> curve IMO.
Yes, indexing with a scalar (as opposed to slicing or fancy-indexing)
remains a special case just like now. And not just because the result
is read-only -- it also returns a copy, not a view.
I don't think the comparison to the a[i] special-case is very useful,
really. Scalar indexing and the wacky one-dimensional indexing thing
where a[i] -> a[i, ..] (unless a is one-dimensional) would still be
different in general, even aside from the READONLY part, because the
one-dimensional indexing thing only applies to one-dimensional
indexes. For a 3-d array,
a[i, j]
gives an error; it's not the same as a[i, j, ...]. And while I
understand why numpy does what it does for len() and __getitem__(int)
on multi-dimensional arrays (it's to make multi-dimensional arrays act
more like list-of-lists), this is IMO a confusing special case that we
might be better off without, and in any case shouldn't be used as a
guide for how to make the rest of the indexing system work.
> In some ways I believe the "scalar-indexing" special case is simpler for
> newcomers to understand, and is what people already assume, and that a
> "readonly-indexing" special case is more complicated. It's dangerous to
> have a library which people only use correctly by accident, so to speak,
> it's much better if what people think they see is how things are.
This is all true, but current scalars *are* readonly arrays, just
weird ones with some limitations and that people don't realize are
there.
Heck, you can even reshape scalars:
In [10]: a = np.float64(0)
In [11]: a.reshape((1, 1))
Out[11]: array([[ 0.]])
And resizing is allowed... but silently does nothing:
In [12]: a.resize((1, 1))
In [13]: a
Out[13]: 0.0
> (With respect to arr[5] returning a good old Python scalar for floats
> and ints -- Travis' example from 2002 is division, and at least that
> example is much less serious now with the introduction of the //
> operator in Python.)
I thought Travis's example was (in current numpy terms):
In [1]: a = np.array([-1.0, 1.0])
# Pretend that np.sum() returns a float, which uses Python's arithmetic:
In [2]: 1 / float(np.sum(a))
ZeroDivisionError: float division by zero
# It actually returns a numpy scalar, which uses numpy's arithmetic:
In [3]: 1 / np.sum(a)
/home/njs/.user-python2.7-64bit/bin/ipython:1: RuntimeWarning: divide
by zero encountered in double_scalars
#!/home/njs/.user-python2.7-64bit/bin/python
Out[3]: inf
Anyway, you still need to return some sort of special object for
anything that's not part of python's type system (structured arrays,
custom dtypes like enumerated values, etc.). So returning good-old
Python scalars (GOPS?) for floats/ints/bools actually introduces a new
special case.
>> One could argue about structured datatypes, but maybe then it should be
>> a datatype property whether its mutable or not, and even then the
>> element should probably be a copy (though I did not check what happens
>> here right now).
>
> Elements from arrays with structured dtypes are already mutable (*and*,
> at least until recently, could still be used as dict keys...). This was
> discussed on the list a couple of months back I think.
Yeah, this is another weird wart we could fix up in the process...
-n
More information about the NumPy-Discussion
mailing list