[Numpy-discussion] NumPy re-factoring project
Charles R Harris
Sat Jun 12 14:41:37 CDT 2010
On Sat, Jun 12, 2010 at 1:35 PM, Charles R Harris <firstname.lastname@example.org
> On Sat, Jun 12, 2010 at 11:38 AM, Dag Sverre Seljebotn <
> email@example.com> wrote:
>> Christopher Barker wrote:
>> > David Cournapeau wrote:
>> >>> In the core C numpy library there would be new "numpy_array" struct
>> >>> with attributes
>> >>> numpy_array->buffer
>> >> Anything non trivial will require memory allocation and object
>> >> ownership conventions.
>> > I totally agree -- I've been thinking for a while about a core array
>> > data structure that you could use from C/C++, and would interact well
>> > with numpy and Python -- it would be even better if it WAS numpy.
>> > I was thinking that at the root of it would be a "data_block" object
>> > (the buffer in the above), that would have a reference counting system.
>> > It would be its own system, but hopefully be able to link to Python's
>> > easily when used with Python.
>> I think taking PEP 3118, strip out the Python-specific things, and then
>> add memory management conventions, would be a good starting point.
>> Simply a simple header file/struct definition and specification, which
>> could in time become a de facto way of exporting multidimensional array
>> data between C libraries, between Fortran and C and so on (Kurt Smith's
>> fwrap could easily be adapted to support it). The C-NumPy would then be a
>> library on top of this spec (mainly ufuncs operating on such structs).
>> The memory management conventions needs some thought, as you say, because
>> of slices -- but a central memory allocator is not good enough because one
>> would often be accessing memory that's allocated with other purposes in
>> mind (and should not be deallocated, or deallocated in a special way). So
>> refcounts + deallocator callback seems reasonable.
>> (Not that I'm involved in this, just my 2 cents.)
> This is more the way I see things, except I would divide the bottom layer
> into two parts, views and memory. The memory can come from many places --
> memmaps, user supplied buffers, etc. -- but we should provide a simple
> reference counted allocator for the default. The views correspond more to
> PEP 3118 and simply provide data types, dimensions, and strides, much as
> arrays do now. However, I would confine the data types to those available in
> C with a bit extra information as to precision, because. Object arrays
> would be a special case of pointer arrays (void pointer arrays?) and
> structured arrays/Unicode might be a special case of char arrays. The more
> complicated dtypes would then be built on top of those. Some things just
> won't be portable, pointers in particular, but such is life.
> As to languages, I think we should stay with C. C++ has much to offer for
> this sort of thing but would be quite a big jump and maybe not as universal
> as C. FORTRAN is harder to come by than C and older versions didn't have
> such things as unsigned integers. I really haven't used FORTRAN since the 77
> version, so haven't much idea what the modern version looks like, but I do
> suspect we have more C programmers than FORTRAN programmers, and adding a
> language translation on top of a design refactoring is just going to
> complicate things.
Oh, and we should have iterators for the views. So the base would be memory
+ views + iterators.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion