[Numpy-discussion] Problem with concatenate and object arrays

Fernando Perez fperez.net at gmail.com
Mon Sep 4 14:18:45 CDT 2006


On 9/3/06, Robert Kern <robert.kern at gmail.com> wrote:

> > I think it's propably a bug:
> >
> >  >>> concatenate((array([]),b))
> > array([None, None], dtype=object)
>
> Well, if you can fix it without breaking anything else, then it's a bug.
>
> However, I would suggest that a rule of thumb for using object arrays is to
> always be explicit. Never rely on automatic conversion from Python containers to
> object arrays. Since Python containers are also objects, it is usually ambiguous
> what the user meant.
>
> I kind of liked numarray's choice to move the object array into a separate
> constructor. I think that gave them some flexibility to choose different syntax
> and semantics from the generic array() constructor. Since constructing object
> arrays is so different from constructing numeric arrays, I think that difference
> is warranted.

This is something that should probably be sorted out before 1.0 is
out.  IMHO, the current behavior is a bit too full of subtle pitfalls
to be a good long-term solution, and I think that predictability
trumps convenience for a good API.  I know that N.array() is already
very complex; perhaps the idea of moving all object array construction
into a separate function would be a long-term win.

I think that object arrays are actually very important for numpy: they
provide the bridge between pure numerical, Fortran-like computing and
the richer world of Python datatypes and complex objects.  But it's
important to acknowledge this bridge character: they connect into a
world where the basic assumptions of homogeneity of numpy arrays don't
apply anymore.  I'd be +1 on forcing this acknowledgement by having a
separate N.oarray constructor, accessible via the dtype flag to
N.array as well, but /without/ N.array trying to invoke it
automatically by guessing the contents of what it was fed.

The downside of this approach is that much of the code that
'magically' just works with N.array(foo) today, would now break.  I'm
becoming almost of the opinion that the code is broken already, it
just hasn't failed yet :)

Over time, I've become more and more paranoid of what constructors
(and factory-type functions that build objects) do with their inputs,
how strongly they validate them, and how explicit they require their
users to be about their intent.  While I'm a big fan of Python's
duck-typing, I've also learned (the hard way) that in larger
codebases, the place to be very strict about input validation is
object constructors.  It's easy to let garbage seep into an object by
not validating a constructor input, and to later (often MUCH later)
have your code bizarrely explode with an unrecognizable exception,
because that little piece of garbage was fed to some third-party code
which was expecting something else.  If I've understood history
correctly, some of the motivations behind Enthought's Traits are of a
similar nature.

For now I've dealt with our private problem that spurred my original
posting.  But I think this issue is worth clarifying for numpy, before
1.0 paints us into a backwards-compatibility corner with a fairly
fundamental datatype and constructor.

Regards,

f




More information about the Numpy-discussion mailing list