[Numpy-discussion] Introduction

Perry Greenfield perry at stsci.edu
Sat Apr 13 18:43:02 CDT 2002


> Ok, here's my list:
>
> Philosophical
>
>   You have a proposal in to the Python guys to make Numarray into the
>   standard _implementation_.  I think standards like this should specify
>   an _interface_, not an implementation.
>
Sure (though there is often more to a standard than just an interface,
but certainly an implementation is generally not the standard). I'm
not sure why you think we imply the implementation is the standard.
We are waiting to rewrite the PEP when we are closer to having
the implementation ready, but we've been very open about the design
and have asked for input on it for a long time now.

> Simplicity
>
>   I can give my users a single XArray.py file, and they can be off and
>   running with something that works right then and there, and it could in
>   many ways be compatible with Numarray (with some slight modifications)
>   when they decide they want the extra functionality of extension modules
>   that you or anyone else who follows your standard provides.  But they
>   don't have to compile anything until they really need to.
>
>   Your implementation leaves me with all or nothing.  I'll have to build
>   and use numarray, or I've got an in house only solution.
>
Hard to comment on this.

> Expediency
>
>   I want to see a usable standard arise quickly.  If you maintain the
>   stance that we should all use the Numarray implementation, instead of
>   just defining a good Numarray interface, everyone has to wait for you
>   to finish things enough to get them accepted by the Python group.  Your
>   implementation is complicated, and I suspect they will have many things
>   that they will want you to change before they accept it into their
>   baseline.  (If you think my list of suggestions is annoying, wait until
>   you see theirs!)
>
I have the strong sense you misunderstand how the process works.
Guido will be driven in large part by the acceptance or non-acceptance
of the Numeric community. If they don't buy into it. It won't be
part of the standard. If it won't be used by many, it won't be part
of the standard. Yes, he will review the design and interface to see
if there should be a long term commitment by the Python maintainers
to have it in the standard library. We have sent him the design
documents, and we do keep him informed. He  has given us feedback
about it. But for the most part, the judgement is going to be by
the Numeric community.

>   If a simple interface protocol is presented, and a simple pure Python
>   module that implements it.  The PEP acceptance process might move along
>   quickly, but you could take your time with implementing your code.
>
> Pragmatic
>
>   You guys aren't finished yet, and I need to give my users an array
>   module ASAP.  As such a new project, there are likely to be many bugs
>   floating around in there.  I think that when you are done, you will
>   probably have a very good library.  Moreover, I'm grateful that you are
>   making it open source.  That's very generous of you, and the fact that
>   you are tolerating this discussion is definitely appreciated.
>
>   Still, I can't put off my projects, and I can't task you to
> work faster.
>
>
>   However, I do think we could agree in a very short term that your design
>   for the interface is a good one.  I also think that we (or just
> me if you
>   like) could make a much smaller PEP that would be more readily accepted.
>   Then everyone in this community could proceed at their own pace
> - knowing
>   that if we followed the simple standard we would have inter operability
>   with each other.
>
I think we still don't understand what you need yet. More elaboration
on that later.

> Social
>
>   Normally I wouldn't expect you to care about any of my special issues.
>   You have your own problems to solve.  As I said above, it's generous of
>   you to even offer your source code.
>
>   However, you are (or at least were) trying to push for this to become a
>   standard.  As such, considering how to be more general and apply to a
>   wider class of problems should be on your agenda.  If it's not, then you
>   shouldn't be creating the standard.
>
Pleeease. Just because a library developer doesn't happen to meet your
needs doesn't mean it can't be part of the standard library. There
are plenty of modules in the standard library that could have been
made more general in some way, but there they are. The criteria is
whether it solves problems for a large community of users, not that
it is infinitely extensible or so on. Software development is full of
trade-offs and that includes limits to generalization. Sure we
can discuss whether things could be made more general or not. But
because you want it more general doesn't mean we just say "Sure, you
define everything!"

>   If you don't care about numarray becoming standard, I would like to try
>   my hand at submitting the slightly modified version of your design.  I
>   won't be compatible with your stuff, but hopefully others will follow
>   suit.
>
You are free to propose your own standard at any time. No one will
stop you from doing so.

> Functionality
>
>   Data Types
>
>     I have needs for other types of data that you probably have little use
>     for.  If I can't coerce you to make a minor change in specification, I
>     really don't think I could coerce you to support brand new data types
>     (complex ints is the one I've beaten to death, because I
> could use that
>
You are right on complex ints (that we won't consider them). One
could take numarray and add them if one wanted and have a more
extended version. But we won't do it, and we wouldn't support as
being in what we maintain. It's one of those trade offs.

>     one in the short term).  What happens when someone at my company wants
>     quaternions?  I suspect that you won't have direct support for those.
>     I know that numarray is supposed to be extensible, but the following
>     raises an exception:
>
>         from numarray import *
>
>         class QuaternionType(NumericType):
>             def __init__(self):
>                 NumericType.__init__(self, "Quaternion", 4*8, 0)
>
>         Quaternion = QuaternionType()  # BOOM!
>
>         q = array(shape=(10, 10), type=Quaternion)
>
>     Maybe I'm just doing something wrong, but it looks like your code
>     wants "Quaternion" to be in your (private?) typeConverters dictionary.
>
Yep, and there's a good reason for that. Just spend a few minutes
thinking about the role types play with array packages and how they
have traditionally been implemented. Generally speaking, it is
presumed that any two numeric types may be used in a binary operator.
So you, Scott, define your special type, Quaternions. You will need
to provide the module all the machinery for knowing what to do with
all the other numeric types available. You may not care, but it is
a requirement that numarray (and Numeric) know what to do. If that
doesn't fit in with your needs, then you shouldn't be trying to use
it. The problem is worse than that. You supply a Quaternion type extension
to numarray, and Bob supplies a super long int type (64 bytes!) also.
Both of you have gone to the trouble of giving numarray the means of
handling all other default numarray types. But you don't know to
handle each other. How do you solve that problem? I don't know.
If you do, let us know. Given the requirements, adding new numeric
types is not going to allow indepenent extensions to work with each
other. That's fairly limiting, but that's the price that is paid
for the feature.

>     Ok, try two:
>
>         from numarray import *
>
>         q = NDArray(shape=(10, 10), itemsize=4*8)
>
>         if a[5][5] is None:
>             print "No boom, but what can I do with it?"
>
>     Maybe this is just a documentation problem.  On the other hand, I can
>     do the following pretty readily:
>
>         import array
>         class Quat2D:
>             def __init__(self, *shape):
>                 assert len(shape) == 2
>                 self._buffer = array.array('d', [0])*shape[0]*shape[1]*4
>                 self._shape, self._stride = tuple(shape), (4*shape[0], 4)
>                 self._itemsize = 4*8
>
>             def __getitem__(self, sub):
>                 assert isinstance(sub, tuple) and len(sub) == 2
>                 offset = sub[0]*self._stride[0] + sub[1]*self._stride[1]
>                 return tuple([self._buffer[offset + i] for i in range(4)])
>
>             def __setitem__(self, sub, val):
>                 assert isinstance(sub, tuple) and len(sub) == 2
>                 offset = sub[0]*self._stride[0] + sub[1]*self._stride[1]
>                 for i in range(4): self._buffer[offset + i] = val[i]
>                 return val
>
>         q = Quat2D(10, 10)
>         q[5, 5] = (1, 2, 3, 4)
>         print q[5, 5]
>
>     This isn't very general, but it is short, and it makes a good example.
>
I'm not sure what it proves. If all you need is an array to store
some kind of type, be able to index and slice it, and not provide
numeric operations, by all means use the existing array module, it
does that fine. It's more work to subclass NDArray, but it can do
it too, and gives you more capabilities (you won't be able to use
index arrays or broadcasting in the array module for example). The
extra functionality comes at some price. Sure, it isn't as simple to
extend. It's your choice if it is worth it or not. If you want
to add your large quaterion array efficiently, then the array
module is worthless. Your example shows nothing about what your
real needs for the object are.

>     If they get half of their data from calculations using Numarray, and
>     half from whatever I provide them, and then try to mix the results in
>     an extension module that has to know about separate implementations,
>     life is more complicated than it should be.
>
It's how you intend to 'mix' these that I have no clue about.

>   Operations
>
>     I'm going to have to write my own C extension modules for some high
>     performance operations.  All I need to get this done is a void*
> pointer,
>     the shape, stride, itemsize, itemtype, and maybe some other things to
>     get off and running.  You have a growing framework, and you have
> already
>     indicated that you think of your hidden variables as private.  I don't
>     think I or my users should have to understand the whole UFunc
> framework
>     and API just to create an extension that manipulates a pointer to an
>     array of doubles.
>
Sigh. No one said you had to understand the ufunc framework to do so.
We are working on an C API that just gives you a simple pointer (it's
actually available now, but we aren't going to tout it until we have
better documentation).

>     Arrays are simpler than UFuncs.  I consider them to be pretty
> seperable
>     parts of your design.  If you keep it this way, and it becomes the
>     standard, it seems that I and everyone else will have to understand
>     both parts in order to create an extension module.
>
Wrong.

> Flexibility
>
>   Numarray is going to make a choice of how to implement slicing.
>  My guess
>   is that it will be one of "copy contiguous", "copy on write", "copy by
>   reference".  I don't know what the correct choice is, but I know that
>   someone else will need something different based on context.
> Things like
>   UFuncs and other extension modules that do fast C level calculations
>   typically don't need to concern themselves with slicing behaviour.
>
And they don't.

> Design
>
>   Your implementation would be similar to having the 'pickle' module
>   require you to derive from a 'Pickleable' base class - instead of simply
>   providing __getstate__ and __setstate__ methods.
>
>   It's an artificial constraint, and those are usually bad.
>
You say. You are quite welcome do your own implementation that
doesn't have this 'artificial' constraint. After all your text
I *still* don't understand how you intend to use the 'interface'
of the private attributes. You haven't provided any example (let
alone a compelling one) of why we should accept any object that
provides those attributes. Shoudn't the object also provide all
the public methods. Shouldn't also provide indexing and so forth.
All in all you are talking about checking quite a few attributes
to make sure the object has the interface. And even if it does,
*why* in the world would we presume that the C functions used by
numarray would work properly with the object you provide. I
really don't have a clue as to what you are getting at here, and
without some real concrete example illustrating this point, I
don't think there is any point to continuing this discussion.
> >
> > All good in principle, but I haven't yet seen a reason to change
> > numarray. As far as I can tell, it provides all you need exactly
> > as it is. If you could give an example that demonstrated otherwise...
> >
>
> Maybe you're right.  I suspect you as the author will come up with the
> quick example that shows how to implement my bizarre quaternion example
> above.  I'm not sure if this makes either of us right or wrong, but if
> you're not buying any of this, then it's probably time for me to chock
> this off to a difference in opinion and move on.
>
> Truthfully this is taking me pretty far from my original tack.  Originally
> I had simply hoped to hack a couple of things into arraymodule.c, and here
> I am now trying to get a simpler standard in place.  I'll try one
> last time
> to convince you with the following two statements:
>
>   - Changing such that you only require the interface is a subtle,
>     but noticeable, improvement to your otherwise very good design.
>
>   - It's not a difficult change.
>
>
> If that doesn't compel you, at least I can walk away knowing I tried.  For
> the volumes I've written, this will probably be my last pesky message if
> you really don't want to budge on this issue.
>
We're not going to budge until you show us what the hell you are talking
about.
>
> The alternative of coming up with a different specifier for
> records/structs
> is probably a mistake now that the struct module already has it's (terse)
> format specification.  Once that is taken into consideration,
> following all
> the leads of the struct module makes sense to me.
>
Again, you are free to do your own, or fork our numarray and
do it the way you want. Or do your own from scratch. Or whatever.
>
[...]
> Also, just mmaping the whole file puts all of the memory use at the
> discretion of the OS.  I might have a gig or two to work with, but if mmap
> takes them all, other threads will have to contend for memory.  The system
> (application) as a whole might very well run better if I can retain some
> control over this.
>
>
> I'm not married to the windowing suggestion.  I think it's something to
> consider, but it might not be a common enough case to try and make a
> standard mechanism for.  If there isn't a way to do it without a kluge,
> then I'll drop it.  Likewise if a simple strategy can't meet anyone's real
> needs.
>
You can forget our doing it. It's out of the question for us.
> >
> > If the 32 bit address is your problem, you are far, far better off
> > using a 64-bit processor and operating system than trying to kludge up
> > a windowing memory mechanism.
> >
>
> We don't always get to specify what platform we want to run on.  Our
> customer has other needs, and sometimes hardware support for
> exotic devices
> dictate what we'll be using.  Frequently it is on 64 bit Alphas, but
> sometimes the requirement is x86 Linux, or 32 bit Solaris.
>
> Finally, our most frustrating piece of legacy software was written in
> Fortran assuming you could stuff a pointer into an INT*4 and now requires
> the -taso flag to the compiler for all new code (which turns a sexy 64 bit
> Alpha into a 32 bit kluge...).
>
You may have customers with unreasonable demands. We don't have to
let them cause an incredible complication in the underlying machinery.
(And we won't). And we won't make it work on Windows 3.1 either.
We have to draw the line somewhere. Your customers will pay dearly
(and you will benefit :-).

> Also, much of our data comes on tapes.  It's not easy to memory map those.
>
Your point being?
> >
>
[...]

This doesn't seem to be going anywhere. If you can give us
a better idea of how your interface needs would be used,
at least we could respond to the specific issues. But we
don't understand and although we are considering some
changes, I'm not going to fold in your requests until
we do understand.

You may not be happy with the progress we are making either.
Sorry, I can't help that. If you need something sooner,
you'll need to do something else. Come up with your
own system and try to get it into Python. Take numarray
and do it the way you think it ought to be done and at
the rate you think it should be done. You're welcome to.
Take the array module and use that as a basis.

We'd like numarray to be part of the standard. We'd like
it to be the standard package in the Numeric community.
But if neither happened, we'd still be working on it.
We need it for our own work. Numeric doesn't give us
the capabilities that we need. We are using it for
our software development and it is being used to reduce
HST data now. We are continuing on this regardless.

Perry





More information about the Numpy-discussion mailing list