[Numpy-discussion] PEP 209: Multi-dimensional Arrays

Konrad Hinsen hinsen at cnrs-orleans.fr
Wed Feb 14 12:03:20 CST 2001


> Design and Implementation

Some parts of this look a bit imprecise and I don't claim to
understand them. For example:

>     Its relation to the other types is defined when the C-extension
>     module for that type is imported.  The corresponding Python code
>     is:
>     
>     > Int32.astype[Real64] = Real64
>     
>     This says that the Real64 array-type has higher priority than the
>     Int32 array-type.

I'd choose a clearer name than "astype" for this, but that's a minor
detail. More important is how this is supposed to work. Suppose that
in Int32 you say that Real64 has higher priority, and in Real64 you
say that Int32 has higher priority. Would this raise an exception, and
if so, when?

Perhaps the coercion question should be treated in a separate PEP that
also covers standard Python types and provides a mechanism that any
type implementer can use. I could think of a number of cases where I
have wished I could define coercions between my own and some other
types properly.

>     3.  Array:
>     
>     This class contains information about the array, such as shape,
>     type, endian-ness of the data, etc..  Its operators, '+', '-',

What about the data itself?

>     4.  ArrayView
> 
>     This class is similar to the Array class except that the reshape
>     and flat methods will raise exceptions, since non-contiguous

There are no reshape and flat methods in this proposal...

>     1.  Does slicing syntax default to copy or view behavior?
> 
>     The default behavior of Python is to return a copy of a sub-list
>     or tuple when slicing syntax is used, whereas Numeric 1 returns a
>     view into the array.  The choice made for Numeric 1 is apparently
>     for reasons of performance: the developers wish to avoid the

Yes, performance was the main reason. But there is another one: if
slicing returns a view, you can make a copy based on it, but if
slicing returns a copy, there's no way to make a view. So if you
change this, you must provide some other way to generate a view, and
please keep the syntax simple (there are many practical cases where a
view is required).

>     In this case the performance penalty associated with copy behavior
>     can be minimized by implementing copy-on-write.  This scheme has

Indeed, that's what most APL implementations do.

>     data buffer is made.  View behavior would then be implemented by
>     an ArrayView class, whose behavior be similar to Numeric 1 arrays,

So users would have to write something like

    ArrayView(array, indices)

That looks a bit cumbersome, and any straightforward way to write the
indices is illegal according to the current syntax rules.

>     2.  Does item syntax default to copy or view behavior?

If compatibility with lists is a criterion at all, then I'd apply it
consistently and use view semantics. Otherwise let's forget about
lists and discuss 1. and 2. from a purely array-oriented point of
view. And then I'd argue that view semantics is more frequent and
should thus be the default for both slicing and item extraction.

>     3.  How is scalar coercion implemented?

The old discussion again...

>     annoying, particularly for very large arrays.  We prefer that the
>     array type trumps the python type for the same type class, namely

That is a completely arbitrary rule from any but the "large array
performance" point of view. And it's against the Principle of Least
Surprise.

Now that we have the PEP procedure for proposing any change
whatsoever, why not lobby for the addition of a float scalar type to
Python, with its own syntax for constants? That looks like the best
solution from everybody's point of view.

>     4.  How is integer division handled?
>     
>     In a future version of Python, the behavior of integer division
>     will change.  The operands will be converted to floats, so the

Has that been decided already?

>     7.  How are numerical errors handled (IEEE floating-point errors in
>         particular)?
> 
>     It is not clear to the proposers (Paul Barrett and Travis
>     Oliphant) what is the best or preferred way of handling errors.
>     Since most of the C functions that do the operation, iterate over
>     the inner-most (last) dimension of the array.  This dimension
>     could contain a thousand or more items having one or more errors
>     of differing type, such as divide-by-zero, underflow, and
>     overflow.  Additionally, keeping track of these errors may come at
>     the expense of performance.  Therefore, we suggest several
>     options:

I'd like to add another one:

e. Keep some statistics about the errors that occur during the
   operation, and if at the end the error count is > 0, raise
   an exception containing as much useful information as possible.

I would certainly not want any Python program to *print* anything
unless I have explicitly told it to do so.

Konrad.
-- 
-------------------------------------------------------------------------------
Konrad Hinsen                            | E-Mail: hinsen at cnrs-orleans.fr
Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24
Rue Charles Sadron                       | Fax:  +33-2.38.63.15.17
45071 Orleans Cedex 2                    | Deutsch/Esperanto/English/
France                                   | Nederlands/Francais
-------------------------------------------------------------------------------




More information about the Numpy-discussion mailing list