[Numpy-discussion] Behavior of .base
Charles R Harris
Mon Oct 1 09:12:55 CDT 2012
On Mon, Oct 1, 2012 at 6:20 AM, Nathaniel Smith <firstname.lastname@example.org> wrote:
> On Sun, Sep 30, 2012 at 8:59 PM, Travis Oliphant <email@example.com>
> > Hey all,
> > In a github-discussion with Gael and Nathaniel, we came up with a
> proposal for .base that we should put before this list. Traditionally,
> .base has always pointed to None for arrays that owned their own memory and
> to the "most immediate" array object parent for arrays that did not own
> their own memory. There was a long-standing issue related to running out
> of stack space that this behavior created.
> To be *completely* accurate, I'd say that they've always pointed to
> some object that owned the underlying memory. Usually that's an
> ndarray, but sometimes that's a thing exposing the buffer interface,
> sometimes it's a thing exposing __array_interface__, sometimes it's a
> mmap object, sometimes it's some random ad hoc C-level wrapper
> object, etc.
>  e.g.
> > Recently this behavior was altered so that .base always points to "the
> original" object holding the memory (something exposing the buffer
> interface). This created some problems for users who relied on the fact
> that most of the time .base pointed to an instance of an array object.
> > The proposal here is to change the behavior of .base for arrays that
> don't own their own memory so that the .base attribute of an array points
> to "the most original object" that is still an instance of the type of the
> array. This would go into the 1.7.0 release so as to correct the
> issues reported.
> > What are reactions to this proposal?
> As a band-aid to avoid breaking some code in 1.7, it seems reasonable
> to me. I was actually considering proposing basically the same idea.
> But it's only a band-aid; the larger problem is that we don't *know*
> what semantics people are relying on for "base" (and probably aren't
> implementing the ones people think we are, either before or after this
> As an example of how messy this is: do you know whether Gael's code
> will still work, after we make this fix, if someone uses as_strided()
> on a (view of a) memmap array?
> Answer: as_strided() creates an ndarray view on an ad-hoc object with
> __array_interface__ attribute, and this dummy object ends up as the
> returned ndarray's .base. According to the proposed rule, the .base
> chain collapsing will stop at this point. So it isn't true that an
> array that is ultimately backed by mmap will have a .memmap() array as
> its .base. However, if you read stride_tricks.py, it turns out the
> dummy object as_strided makes does happen to use the name ".base" for
> its attribute holding the original array, so Gael's code will work
> correctly in this case iff he keeps the .base walking code in place
> (which would otherwise serve no purpose after Travis' change).
> Anyway, my point is: If we have to carefully analyze interactions
> between code in numpy.lib.stride_tricks, numpy.core.memmap, and a
> third-party library, just to figure out which sorts of
> reference-counting changes are correct in the core ndarray object,
> then we have a problem. This is horrible cross-coupling, the sort of
> thing that, if allowed to proliferate, makes it impossible to ever
> know whether code is correct or not.
> So even if we put in a band-aid for 1.7, we really don't want to be
> guaranteeing this kind of stuff forever, and should aggressively
> encourage people to stop using .base in these ways. The mmap thing
> should really switch to something more reliable and less tightly
> coupled to the rest of the code all over numpy, like I described here:
> How can we discourage people from doing this in the future? Can we
> make .base write-only from the Python level (with suitable deprecation
> period)? Rename it to ._base (likewise) so that it's still possible to
> peek under the covers but we remind people that it's really an
> implementation detail with poorly defined semantics that might change?
Well said. This reminds me of the fellow who used genetic programming to
design an algorithm for a signal processing chip and discovered that the
result was making use of some stray capacitance present on the chip. Here
users such as Gael are the genetic programmers and .base is the stray
capacitance. I tend to the ._base idea, but I think this needs to be
addressed in detail.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion