[Numpy-discussion] A possible addition to PEP 209

Chris Barker chrishbarker at home.net
Fri Sep 7 17:01:23 CDT 2001


Konrad Hinsen wrote:

> Such as? Given the overall dynamic nature of Python, I don't see any
> real opportunities outside array-specific code. What optimizations
> could be done knowing *only* that all elements of a sequence are of
> the same type, but without a particular data layout?

I remember Guido's answer to a FAQ a while ago (the current FAQ has a
more terse version) that essentially stated that compiled Python
wouldn't be much faster because of Python's dynamic nature, so that the
expression x = y*z needs to be evaluated to see what type y is, what
type z is, and what it means to multiply them, before the actual
multiplication can take place. All that checking takes a whole lot
longer than the multiplication itself. NumPy, or course, helps this
along, because once it is determined that y and z are both NumPy arrays
of the same shape and type, it can multiply all the elements in a C
loop, without any further type checking.

Where this breaks down, of course, is when you can't express what you
want to do as a set of array functions. Once you learn the tricks these
times are fairly rare (which is why NumPy is useful), but there are
times when you can't use array-math, and need to loop through the
elements of an array and operate on them. Python currently has to type
check each of those elements every time something is done on them. In
principle, if the Python VM knew that A was an array of Floats, it would
know that A[i,j,k] was a Float, and it would not have to do a check. I
think it would be easiest to get optimization in sequence-oriented
operations, such as map() and list comprehensions:

map(fun, A)

when this is called, the function bound to the "fun" name is passed to
map, as is the array bound to the name "A". The Array is known to be
homogeous. map could conceivably compile a version of fun that worked on
the type of the items in A, and then apply that function to all the
elements in A, without type checking, and looping at the speed of C.
This is the kind of optimization I am imagining. Kind of an instant
U-func.

Something similar could be done with list comprehensions.

Of course, most things that can be done with list comprehensions  and
map() can be done with array operators anyway, so the real advantage
would be a smarter compiler that could do that kind of thing inside a
bunch of nested for loops. There is at least one project heading in that
direction:

    * Psyco (Armin Rego) - a run-time specializing virtual machine that
sees
      what sorts of types are input to a function and compiles a type-
or
      value-specific version of that function on-the-fly.  I believe
Armin
      is looking at some JIT code generators in addition to or instead
of
      another virtual machine.

knowing that all the elements of an Array (or any other sequence) are
the same type could help here a lot. Once a particular function was
compiled with a given set of types, it could be called directly on all
the elements of that array (and other arrays) with no type checking.

What it comes down to is that while Python's dynamic nature is
wonderful, and very powerful and flexible, there are many, many, times
that it is not really needed, particularly inside a given small
function. The standard answer about Python optimization is that you just
need to write those few small functions in C. This is only really
practical if they are functions that operate on particular expected
inputs: essentially statically typed input (or at least not totally
general). Even then, it is a substantial effort, even for those with
extensive C experience (unlike me). It would be so much better if a Py2C
or a JIT compiler could optimize those functions for you. 

I know this is a major programming effort, and will, at best, be years
away, but I'd like to see Python move in a direction that makes it
easier to do, and allows small steps to be done at a time. I think
introducing the concept of a homogenous sequence could help a lot of
these efforts. Numeric Arrays would be just a single example. Python
already has strings, and tuples could be marked as homogenous when
created, if they were. So could lists, but since they are mutable, their
homogenaity could change at any time, so it might not be useful.

I may be wrong about all this, I really don't know a whit about writing
compilers or interpreters, or VMs, but I'm throughing around the idea,
to see what I can learn, and see if it makes any sense.

-Chris


-- 
Christopher Barker,
Ph.D.                                                           
ChrisHBarker at home.net                 ---           ---           ---
http://members.home.net/barkerlohmann ---@@       -----@@       -----@@
                                   ------@@@     ------@@@     ------@@@
Oil Spill Modeling                ------   @    ------   @   ------   @
Water Resources Engineering       -------      ---------     --------    
Coastal and Fluvial Hydrodynamics --------------------------------------
------------------------------------------------------------------------




More information about the Numpy-discussion mailing list