FW: [Numpy-discussion] Bug: extremely misleading array behavior

eric jones eric at enthought.com
Sun Jun 9 17:19:13 CDT 2002


> > If you are proposing something like
> >
> > y = x + Float32(1.)
> >
> > it would work, but it sure leads to some awkward expressions.
> 
> Yes, that's what I am proposing. It's no worse than what we have now,
> and if writing Float32 a hundred times is too much effort, an
> abbreviation like f = Float32 helps a lot.
> 
> Anyway, following the Python credo "explicit is better than implicit",
> I'd rather write explicit type conversions than have automagical ones
> surprise me.

How about making indexing (not slicing) arrays *always* return a 0-D
array with copy instead of "view" semantics?  This is nearly equivalent
to creating a new scalar type, but without requiring major changes.  I
think it is probably even more useful for writing generic code because
the returned value with retain array behavior.  Also, the following
example

 >   a = array([1., 2.], Float)
 >   b = array([3., 4.], Float32)
 > 
 >   a[0]*b

would now return a Float array as Konrad desires because a[0] is a Float
array.  Using copy semantics would fix the unexpected behavior reported
by Larry that kicked off this discussion.  Slices are a different animal
than indexing that would (and definitely should) continue to return view
semantics.

I further believe that all Numeric functions (sum, product, etc.) should
return arrays all the time instead of converting implicitly converting
them to Python scalars in special cases such as reductions of 1d arrays.
I think the only reason for the silent conversion is that Python lists
only allow integer values for use in indexing so that:

 >>> a = [1,2,3,4]
 >>> a[array(0)]
 Traceback (most recent call last):
   File "<stdin>", line 1, in ?
 TypeError: sequence index must be integer

Numeric arrays don't have this problem:

 >>> a = array([1,2,3,4])
 >>> a[array(0)]
 1

I don't think this alone is a strong enough reason for the conversion.
Getting rid of special cases is more important because it makes behavior
predictable to the novice (and expert), and it is easier to write
generic functions and be sure they will not break a year from now when
one of the special cases occurs.  

Are there other reasons why scalars are returned?

On coercion rules:

As for adding the array to a scalar value, 

  x = array([3., 4.], Float32)
  y = x + 1.

Should y be a Float or a Float32?  I like numarray's coercion rules
better (Float32).  I have run into this upcasting to many times to
count.  Explicit and implicit aren't obvious to me here.  The user
explicitly cast x to be Float32, but because of the limited numeric
types in Python, the result is upcast to a double.  Here's another
example,

  >>> from Numeric import *
  >>> a = array((1,2,3,4), UnsignedInt8)
  >>> left_shift(a,3)
  array([ 8, 16, 24, 32],'i')

I had to stare at this for a while when I first saw it before I realized
the integer value 3 upcast the result to be type 'i'.  So, I think this
is confusing and rarely the desired behavior.  The fact that this is
inconsistent with Python's "always upcast" rule is minor for me.  The
array math operations are necessarily a different animal from scalar
operations because of the extra types supported.  Defining these
operations in a way that is most convenient for working with array data
seems OK.

On the other hand, I don't think a jump from 21 to 22 is enough of a
jump to make such a change.  Numeric progresses pretty fast, and users
don't expect such a major shift in behavior.  I do think, though, that
the computational speed issue is going to result in numarray and Numeric
existing side-by-side for a long time.  Perhaps we should think create
an "interim" Numeric version (maybe starting at 30), that tries to be
compatible with the upcoming numarray, in its coercion rules, etc?
Advanced features such as indexing arrays with arrays, memory mapped
arrays, floating point exception behavior, etc. won't be there, but it
should help people transition their codes to work with numarray, and
also offer a speedy alternative.

A second choice would be to make SciPy's Numeric implementation the
intermediate step.  It already produces NaN's during div-by-zero
exceptions according to numarray's rules.  The coercion modifications
could also be incorporated.  

> 
> Finally, we can always lobby for inclusion of the new scalar types
> into the core interpreter, with a corresponding syntax for literals,
> but it would sure help if we could show that the system works and
> suffers only from the lack of literals.

There was a seriously considered debate last year about unifying
Python's numeric model into a single type to get rid of the
integer-float distinction, at last year's Python conference and the
ensuing months.  While it didn't (and won't) happen, I'd be real
surprised if the general community would welcome us suggesting stirring
yet another type into the brew.  Can't we make 0-d arrays work as an
alternative?

eric
 
> 
> Konrad.
> --
>
------------------------------------------------------------------------
--
> -----
> Konrad Hinsen                            | E-Mail:
hinsen at cnrs-orleans.fr
> Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24
> Rue Charles Sadron                       | Fax:  +33-2.38.63.15.17
> 45071 Orleans Cedex 2                    | Deutsch/Esperanto/English/
> France                                   | Nederlands/Francais
>
------------------------------------------------------------------------
--
> -----
> 
> _______________________________________________________________
> 
> Don't miss the 2002 Sprint PCS Application Developer's Conference
> August 25-28 in Las Vegas -
> http://devcon.sprintpcs.com/adp/index.cfm?source=osdntextlink
> 
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/numpy-discussion





More information about the Numpy-discussion mailing list