FW: [Numpy-discussion] Bug: extremely misleading array behavior

eric jones eric at enthought.com
Tue Jun 11 22:28:03 CDT 2002


> "eric jones" <eric at enthought.com> writes:
> 
> 
> > I think the consistency with Python is less of an issue than it
seems.
> > I wasn't aware that add.reduce(x) would generated the same results
as
> > the Python version of reduce(add,x) until Perry pointed it out to
me.
> > There are some inconsistencies between Python the language and
Numeric
> > because the needs of the Numeric community.  For instance, slices
create
> > views instead of copies as in Python.  This was a correct break with
> > consistency in a very utilized area of Python because of efficiency.
> 
> Ahh, a loaded example ;) I always thought that Numeric's view-slicing
is a
> fairly problematic deviation from standard Python behavior and I'm not
> entirely sure why it needs to be done that way.
> 
> Couldn't one have both consistency *and* efficiency by implementing a
> copy-on-demand scheme (which is what matlab does, if I'm not entirely
> mistaken; a real copy gets only created if either the original or the
> 'copy'
> is modified)? 

Well, slices creating copies is definitely a bad idea (which is what I
have heard proposed before) -- finite difference calculations (and
others) would be very slow with this approach.  Your copy-on-demand
suggestion might work though.  Its implementation would be more complex,
but I don't think it would require cooperation from the Python core.?
It could be handled in the ufunc code.  It would also require extension
modules to make copies before they modified any values.  

Copy-on-demand doesn't really fit with python's 'assignments are
references" approach to things though does it?  Using foo = bar in
Python and then changing an element of foo will also change bar.  So, I
guess there would have to be a distinction made here.  This adds a
little more complexity.

Personally, I like being able to pass views around because it allows for
efficient implementations.  The option to pass arrays into extension
function and edit them in-place is very nice.  Copy-on-demand might
allow for equal efficiency -- I'm not sure.

I haven't found the current behavior very problematic in practice and
haven't seen that it as a major stumbling block to new users.  I'm happy
with status quo on this. But, if copy-on-demand is truly efficient and
didn't make extension writing a nightmare, I wouldn't complain about the
change either.  I have a feeling the implementers of numarray would
though. :-)  And talk about having to modify legacy code...

> The current behavior seems not just problematic because it
> breaks consistency and hence user expectations, it also breaks code
that
> is
> written with more pythonic sequences in mind (in a potentially hard to
> track
> down manner) and is, IMHO generally undesirable and error-prone, for
> pretty
> much the same reasons that dynamic scope and global variables are
> generally
> undesirable and error-prone -- one can unwittingly create intricate
> interactions between remote parts of a program that can be very
difficult
> to
> track down.
> 
> Obviously there *are* cases where one really wants a (partial) view of
an
> existing array. It would seem to me, however, that these cases are
> exceedingly
> rare (In all my Numeric code I'm only aware of one instance where I
> actually
> want the aliasing behavior, so that I can manipulate a large array by
> manipulating its views and vice versa).  Thus rather than being the
> default
> behavior, I'd rather see those cases accommodated by a special syntax
that
> makes it explicit that an alias is desired and that care must be taken
> when
> modifying either the original or the view (e.g. one possible syntax
would
> be
> ``aliased_vector = m.view[:,1]``).  Again I think the current behavior
is
> somewhat analogous to having variables declared in global (or dynamic)
> scope
> by default which is not only error-prone, it also masks those cases
where
> global (or dynamic) scope *is* actually desired and necessary.
> 
> It might be that the problems associated with a copy-on-demand scheme
> outweigh the error-proneness, the interface breakage that the
deviation
> from
> standard python slicing behavior causes, but otherwise copying on
slicing
> would be an backwards incompatibility in numarray I'd rather like to
see
> (especially since one could easily add a view attribute to Numeric,
for
> forwards-compatibility). I would also suspect that this would make it
*a
> lot*
> easier to get numarray (or parts of it) into the core, but this is
just a
> guess.

I think the two things Guido wants for inclusion of numarray is a
consensus from our community on what we want, and (more importantly) a
comprehensible code base. :-)  If Numeric satisfied this 2nd condition,
it might already be slated for inclusion...  The 1st is never easy with
such varied opinions -- I've about concluded that Konrad and I are
anti-particles :-) -- but I hope it will happen. 

> 
> >
> > I don't see choosing axis=-1 as a break with Python --
multi-dimensional
> > arrays are inherently different and used differently than lists of
lists
> > in Python.  Further, reduce() is a "corner" of the Python language
that
> > has been superceded by list comprehensions.  Choosing an alternative
> 
> Guido might nowadays think that adding reduce was as mistake, so in
that
> sense
> it might be a "corner" of the python language (although some people,
> including
> me, still rather like using reduce), but I can't see how you can
generally
> replace reduce with anything but a loop. Could you give an example?

Your right.  You can't do it without a loop. List comprehensions only
supercede filter and map since they always return a list.

I think reduce is here to stay.  And, like you, I would actually be
disappointed to see it go (I like lambda too...)  The point is that I
wouldn't choose the definition of sum() or product() based on the
behavior of Python's reduce operator.  Hmmm. So I guess that is key --
its really these *function* interfaces that I disagree with.  

So, how about add.reduce() keep axis=0 to match the behavior of Python,
but sum() and friends defaulted to axis=-1 to match the rest of the
library functions?  It does break with consistency across the library,
so I think it is sub-optimal.  However, the distinction is reasonably
clear and much less likely to cause confusion.  It also allows FFT and
future modules (wavelets or whatever) operate across the fastest axis by
default while conforming to an intuitive standard. take() and friends
would also become axis=-1 for consistency with all other functions.
Would this be a reasonable compromise?

eric

> 
> 
> alex
> 
> --
> Alexander Schmolck     Postgraduate Research Student
>                        Department of Computer Science
>                        University of Exeter
> A.Schmolck at gmx.net     http://www.dcs.ex.ac.uk/people/aschmolc/





More information about the Numpy-discussion mailing list