[Numpy-discussion] Introduction

Perry Greenfield perry at stsci.edu
Fri Apr 12 17:44:04 CDT 2002


Scott Gilbert writes:
>      import array
>      class ScottArray:
>          def __init__(self):
>              self.ndarray_buffer   = array.array('d', [0]*100)
>              self.ndarray_shape    = (10, 10)
>              self.ndarray_stride   = (80, 8)
>              self.ndarray_itemsize = 8
>              self.ndarray_itemtype = 'Float64'
>
>      import numarray
>
>      n = numarray.numarray((10, 10), type='Float64')
>      s = ScottArray()
>
>      very_cool = numarray.add(n, s)
>
But why not (I may have some details wrong, I'm doing this
from memory, and I haven't worked on it myself in a bit):

import array
import numarray
import memory # comes with numarray
class ScottArray(NumArray):
    def __init__(self):
        # create necessary buffer obj
        buf = memory.writeable_buffer(array.array('d', [0]*100))
        Numarray.__init__(self, shape=(10, 10), type=numarray.Float64
                          buffer=buf)
        # _strides not settable from constructor yet, but currently
        # if you needed to set it:
        # self._strides = (80, 8)
        # But for this case it would be computed automatically from
        # the supplied shape


n = numarray.numarray((10, 10), type='Float64')
s = ScottArray()

maybe_not_quite_so_cool_but_just_as_functional = n + s

> This example is kind of silly.  I mean, why wouldn't I just use
> numarray for
> all of my array needs?  Well, that's where my world is a little
> different than
> yours I think.  Instead of using 'array.array()' above, there are
> times where
> I'll need to use 'whizbang.array()' to get a different
> PyBufferProcs supporting
> object.  Or where I'll need to work with a crazy type in one part
> of the code,
> but I'd like to pass it to an extension that combines your types and mine.
>
> In these cases where I need "special memory" or "special types" I
> could try and
> get you guys to accept a patch, but this would just pollute your
> project and
> probably annoy you in general.  A better solution is to create a general
> standard mechanism for implementing NDArray types, and let me make my own.
>
>From everything I've seen so far, I don't see why you can't
just create a NumArray object directly. You can subclass it
(and use multiple inheritance if you need to subclass a different
object as well) and add whatever customized behavior you want.
You can create new kinds of objects as buffers just so long
as you satisfy the buffer interface.
>
> In the above example, we could have completely different NDArray
> implementations working interoperably inside of one UFunc.  It
> seems to me that
> all it really takes to be an NDArray can be specified by a list
> of attributes
> like the one above.  (Probably need a few more attributes to be
> really general:
> 'ndarray_endian', etc...)  In the end, NDArrays are just pointers
> to a buffer,
> and descriptors for indexing.
>
Again, why not just create an NDArray object with the appropriate
buffer object and attributes (subclassing if necessary).

>
> I don't believe this would have any significant affect on the
> performance of
> numarray.  (The efficient fast C code still gets a pointer to
> work with.)  More
> over, I'd be very willing to contribute patches to make this happen.
>
>
> If you agree, and we can flesh out what this "attribute
> interface" should be,
> then I can start distributing my own array module to the
> engineers where I work
> without too much fear that they'll be screwed once numarray is
> stable and they
> want to mix and match.
>
> Code always lives a lot longer than I want it to, and if I give
> them something
> now which doesn't work with your end product, I'll have done them
> a disservice.
>
All good in principle, but I haven't yet seen a reason to change
numarray. As far as I can tell, it provides all you need exactly
as it is. If you could give an example that demonstrated otherwise...
>
> It's all open for discussion, but I would propose that
> ndarray_endian be one
> of:
>
>     '>' - big endian
>     '<' - little endian
>
> This is how the standard Python struct module specifies endian,
> and I've been
> trying to stay consistant with the baseline when possible.
>
To tell you the truth, I'm not crazy about how the struct module
handles types or attributes. It's generally far too cryptic for
my tastes. Other than providing backward compatibility, we aren't
interested in it emulating struct.

> >
> > The above scheme is needed for our purposes because many of our
> data files
> > contain multiple data arrays and we need a means of creating a numarray
> > object for each one. Most of this machinery has already been
> implemented,
> > but we haven't released it since our I/O package (for astronomical FITS
> > files) is not yet at the point of being able to use it.
> >
>
>
I could well misundertand, but I thought that if you mmap a file
in unix in write mode, you do not use up the virtual memory as
limited by the physical memory and the paging file. Your only
limit becomes the virtual address space available to the processor.
If the 32 bit address is your problem, you are far, far better off
using a 64-bit processor and operating system than trying to kludge up
a windowing memory mechanism. I could see a way of doing it for
ufuncs, but the numeric world (and I would think the DSP world
as well) needs far more than element-by-element array functionality.
providing a usable C-api for that kind of memory model would be
a nightmare. But I'm not sure if this or the page file is your
limitation.

Perry





More information about the Numpy-discussion mailing list