[Numpy-discussion] immutable numpy arrays

Geoffrey Irving irving@naml...
Fri Dec 19 13:50:15 CST 2008


On Thu, Dec 18, 2008 at 1:00 PM, Robert Kern <robert.kern@gmail.com> wrote:
> On Thu, Dec 18, 2008 at 10:01, Geoffrey Irving <irving@naml.us> wrote:
>> On Wed, Dec 17, 2008 at 4:28 PM, Robert Kern <robert.kern@gmail.com> wrote:
>
>>> It just seems to me to be another complication that does not provide
>>> any guarantees. You say "Currently numpy arrays are either writable or
>>> unwritable, but unwritable arrays can still be changed through other
>>> copies." Adding an immutable flag would just change that to "Currently
>>> numpy arrays are either mutable or immutable, but immutable arrays can
>>> still be changed through other copies." Basically, the writable flag
>>> is intended to indicate your use case. It can be circumvented, but the
>>> same methods of circumvention can be applied to any set of flags.
>>
>> The point of an immutable array would be that _can't_ be changed
>> through other copies except through broken C code (or the ctypes /
>> __array_interface__ equivalents), so it's not correct to say that it's
>> the same as unwriteable.  It's the same distinction as C++ const vs.
>> Java final.  Immutability is already a common notion in python, e.g.,
>> list vs. tuple and set vs. frozenset, and it's unfortunate that numpy
>> doesn't have an equivalent.
>>
>> However, if you agree that even _with_ the guarantee it's not a useful
>> concept, I'm happy to drop it.
>
> What I'm trying to suggest is that most code already treats the
> writeable flag like I think you want the immutable flag to be treated.
> I'm not sure what you think is missing.

After further consideration, I'll withdraw the immutability flag
request.  I think most of what looking for can be implemented with
inheritance, though not in a completely satisfactory manner.  Here are
details:

My main use case is interacting with a system that deals with
immutable arrays without having to introduce unnecessary copying.  The
system makes heavy use of dependency analysis internally to cache/save
computation, and may segfault if an array it thinks is immutable
changes (e.g. if the array describes the topology of a mesh).  It
should be impossible for normal python scripting to cause such a
segfault.

Say I have a function "get_array" which returns an array from this
system which is guaranteed immutable, a function "set_array" which
stores an array.  It is safe to skip the copy if I do something like

    set_array(get_array())

However, set_array can't distinguish this from

    a = get_array().copy()
    b = a[:]
    a.flags.writeable = 0
    set_array(a)
    b[0] = 3

The difference between writable and immutable is that it would be
invalid to set the writable flag to False after creation, since the
array may have already leaked.  However, this is rather convoluted
code, but it's the only example I can come up with that would be fixed
with just an immutability flag.  Therefore, the immutability flag is a
bad idea.

A more interesting and likely example is

    set_array(2 * get_array())

In this case, set_array() will receive an unwriteable array with
reference count 1 (it owns the only reference).  However, that is
indistinguishable from

    a = 2 * get_array()
    set_array(a[:])
    a[0] = 3

One way to solve this is to make a derived array class which is always
immutable and propagates immutability and unwritability during
arithmetic.  This would safely avoid the overhead in all examples
above, and is straightforward to implement.  Unfortunately, it adds
unnecessary copying in legitimate code that wants to modify results:

    a = 2 * get_array()
    a[0] = 2 # exception!
    set_array(a)

Get rid of all unnecessary copies in that code would require tracking
leaks and allowing set_array to either freeze "a" or change it to
copy-on-write.  That might end up too complicated or magical to be
practical, though.  In particular, it couldn't be implemented in a
completely safe manner using inheritance.

In any case, I think the benefit would be tiny enough that I should
drop it and stick to copies unless someone else expresses interest.

Thanks,
Geoffrey


More information about the Numpy-discussion mailing list