[Numpy-discussion] Unexpected behavior with numpy array
Sun Feb 3 23:47:28 CST 2008
Damian Eads wrote:
> Robert Kern wrote:
>> Damian Eads wrote:
>>> Here's another question: is there any way to construct a numpy array and
>>> specify the buffer address where it should store its values? I ask
>>> because I would like to construct numpy arrays that work on buffers that
>>> come from mmap.
>> Can you clarify that a little? By "buffer" do you mean a Python buffer() object?
> Yes, I mean the .data field of a numpy array, which is a buffer object,
> and points to the memory where an array's values are stored.
Actually, the .data field is always constructed by ndarray; it is never provided
*to* ndarray even if you construct the ndarray from a buffer object. The buffer
object's information is interpreted to construct the ndarray object and then the
original buffer object is ignored. The .data attribute will be constructed
"on-the-fly" when it is requested.
In : from numpy import *
In : s = 'aaaa'
In : b = buffer(s)
In : a = frombuffer(b, dtype=int32)
In : a.data is b
In : d1 = a.data
In : d2 = a.data
In : d1 is d2
>> By "mmap" do you mean Python's mmap in the standard library?
> I actually was referring to the C Standard Library's mmap. My intention
> was to use a pointer returned by C-mmap as the ".data" buffer to store
> array values.
>> numpy has a memmap class which subclasses ndarray to wrap a mmapped file. It
>> handles the opening and mmapping of the file itself, but it could be subclassed
>> to override this behavior to take an already opened mmap object.
> This may satisfy my needs. I'm going to look into it and get back to you.
>> In general, if you have a buffer() object, you can make an array from it using
>> numpy.frombuffer(). This will be a standard ndarray and won't have the
>> conveniences of syncing to disk that the memmap class provides.
> This is good to know because there have been a few situations when this
> would have been very useful.
> Suppose I do something like (in Python):
> import ctypes
> mylib = ctypes.CDLL('libmylib.so')
> y = mylib.get_float_array_from_c_function()
> which returns a float* as a Python int, and then I do
> nelems = mylib.get_float_array_num_elems()
> x = numpy.frombuffer(ctypes.c_buffer(y), 'float', nelems)
> This gives me an ndarray x with its (.data) buffer pointing to the
> memory address give by y. When the ndarray x is no longer referenced
> (even as another array's base), does numpy attempt to free the memory
> pointed to by y? In other words, does numpy always deallocate the
> (.data) buffer in the __del__ method? Or, does fromarray set a flag
> telling it not to?
By default, frombuffer() creates an array that is flagged as not owning the
data. That means it will not delete the data memory when the ndarray object is
In : import ctypes
In : ca = (ctypes.c_int*8)()
In : a = frombuffer(ci, int)
In : a
Out: array([0, 0, 0, 0, 0, 0, 0, 0])
In : a.flags
C_CONTIGUOUS : True
F_CONTIGUOUS : True
OWNDATA : False
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False
>> If you don't have a buffer() object, but just have a pointer allocated from some
>> C code, then you *could* fake an object which exposes the __array_interface__()
>> method to describe the memory. The numpy.asarray() constructor will use that to
>> make an ndarray object that uses the specified memory. This is advanced stuff
>> and difficult to get right because of memory ownership and object lifetime
> Allocating memory in C code would be very useful for me. If I were to
> use such a numpy.asarray() function (seems the frombuffer you mentioned
> would also work as described above),
Yes, if you can create the buffer object or something that obeys the buffer
protocol. ctypes arrays work fine; ctypes pointers don't.
> it makes sense for the C code to be
> responsible for deallocating the memory, not numpy. I understand that I
> would need to ensure that the deallocation happens only when the
> containing ndarray is no longer referenced anywhere in Python
> (hopefully, ndarray's finalization code does not need access to the
> .data buffer).
My experience has been that this is fairly difficult to do. If you have
*complete* control of the ndarray object over its entire lifetime, then this is
reasonable. If you don't, then you are going to run into (nondeterministic!)
segfaulting bugs eventually. For example, if you are only using it as a
temporary inside a function and never return it, this is fine. You will also
need to be very careful about constructing views from the ndarray; these will
need to be controlled, too. You will have a bug if you delete myarray but return
reversed_array=myarray[::-1], for example.
I see that you are using ctypes. Be sure to take a look at the .ctypes attribute
on ndarrays. This allows you to get a ctypes pointer object from an array. This
might help you use numpy to allocate the memory and pass that in to your C
In : a.ctypes.data_as(ctypes.POINTER(ctypes.c_int))
Out: <ctypes.LP_c_long object at 0x1c7c800>
"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
More information about the Numpy-discussion