[Numpy-discussion] Latest Array-Interface PEP
oliphant at ee.byu.edu
Thu Jan 11 23:36:09 CST 2007
Torgil Svensson wrote:
> On 1/11/07, Charles R Harris <charlesr.harris at gmail.com> wrote:
>> On 1/11/07, Torgil Svensson <torgil.svensson at gmail.com> wrote:
>>> Sure. I'm not objecting the memory model, what I mean is that data
>>> access between modules has a wider scope than just a memory model.
>>> Maybe i'm completely out-of-scope here, I thought this was worth
>>> considering for the inter-module-data-sharing - scope.
>> This is where separating the memory block from the API starts to show
>> advantages. OTOH, we should try to keep this all as simple and basic as
>> possible. Trying to design for every potential use will lead to over design,
>> it is a fine line to walk.
> I Agree. I'm trying to look after a use case of my own here where I
> have a huge array (won't fit memory) with data that is very easy to
> compress (easily fit in memory). OTOH, I have yet no need to share
> this between modules but a simple data access API opens up a variety
> of options.
I think this is a good idea generally. I think the PIL would be much
more open to this kind of API becauase the memory model of the PIL is
different than ours. On the other hand, I think it would be a shame to
not provide a basic N-d array memory model like NumPy has because it is
used so often.
> I my mindset, I can slice and dice my huge array and the
> implementation behind the data access API will choose between having
> the views represented internally as intervals or lists of indexes.
> So i'm +1 for having all information concerning nd-array access on a
> logical level (shapes) in one API and let the memory-layout-details
> (strides, FORTRAN, C etc) live in another API and a module that wants
> to try to skip the api overhead (numpy) can always do something like:
I had originally thought to separate these out in to multiple calls
anyway. Perhaps we could propose the same thing. Have a full struct
interface as one option and a multiple-call interface like you propose
> if (memory_interface)
> ... use memory_interface->strides ... etc
> ... use array_interface->get_item_fom_index() ... etc
> I'm guessing that most of the modules trying to access an array will
> choose to go through numpy for fast operations.
> Another use of an api is to do things like give an "RGB"-view of an
> image regardless of which weird image format lying below without
> having to convert the whole image in-memory and loose precision or
This is true. So at what level do we propose the API. Single-item
access for sure, but what about
Such a thing would be very useful for all kinds of large data-sets, from
images, and videos, to scientific data-sets.
> If we want the whole in-memory-RGB-copy we could just take the
> RGB-view, pass it to numpy and force numpy to do a copy. The module
> can then, in either case, operate on the image through numpy or return
> a numpy object to the user. (numpy is of course integrated in python
> by then)
Getting this array_interface into Python goes a long way into making
that happen, I think.
More information about the Numpy-discussion