[Numpy-discussion] Latest Array-Interface PEP

Travis Oliphant oliphant at ee.byu.edu
Thu Jan 11 23:36:09 CST 2007


Torgil Svensson wrote:
> On 1/11/07, Charles R Harris <charlesr.harris at gmail.com> wrote:
>   
>> On 1/11/07, Torgil Svensson <torgil.svensson at gmail.com> wrote:
>>     
>>> Sure. I'm not objecting the memory model, what I mean is that data
>>> access between modules has a wider scope than just a memory model.
>>> Maybe i'm completely out-of-scope here, I thought this was worth
>>> considering for the inter-module-data-sharing - scope.
>>>       
>>  This is where separating the memory block from the API starts to show
>> advantages. OTOH, we should try to keep this all as simple and basic as
>> possible. Trying to design for every potential use will lead to over design,
>> it is a fine line to walk.
>>     
>
> I Agree. I'm trying to look after a use case of my own here where I
> have a huge array (won't fit memory) with data that is very easy to
> compress (easily fit in memory). OTOH, I have yet no need to share
> this between modules but a simple data access API opens up a variety
> of options.
>   
I think this is a good idea generally.  I think the PIL would be much 
more open to this kind of API becauase the memory model of the PIL is 
different than ours.  On the other hand, I think it would be a shame to 
not provide a basic N-d array memory model like NumPy has because it is 
used so often.
> I my mindset, I can slice and dice my huge array and the
> implementation behind the data access API will choose between having
> the views represented internally as intervals or lists of indexes.
>
> So i'm +1 for having all information concerning nd-array access on a
> logical level (shapes) in one API and let the memory-layout-details
> (strides, FORTRAN, C etc) live in another API and a module that wants
> to try to skip the api overhead (numpy) can always do something like:
>   
I had originally thought to separate these out in to multiple calls 
anyway.  Perhaps we could propose the same thing.   Have a full struct 
interface as one option and a multiple-call interface like you propose 
as another.
> memory_interface=array_interface->get_memory_layout()
> if (memory_interface)
> {
>    ... use memory_interface->strides ... etc
> }
> else
> {
>    ...  use array_interface->get_item_fom_index() ... etc
> }
>
> I'm guessing that most of the modules trying to access an array will
> choose to go through numpy for fast operations.
>
> Another use of an api is to do things like give an "RGB"-view of an
> image regardless of which weird image format lying below without
> having to convert the whole image in-memory and loose precision or
> memory. 
This is true.   So at what level do we propose the API.  Single-item 
access for sure, but  what about

array_interface->get_block_from_slice() ?

Such a thing would be very useful for all kinds of large data-sets, from 
images, and videos, to scientific data-sets.

> If we want the whole in-memory-RGB-copy we could just take the
> RGB-view, pass it to numpy and force numpy to do a copy. The module
> can then, in either case, operate on the image through numpy or return
> a numpy object to the user. (numpy is of course integrated in python
> by then)
>   
Getting this array_interface into Python goes a long way into making 
that happen, I think.

-Travis




More information about the Numpy-discussion mailing list