[SciPy-user] NumPy arrays of Python objects (it was Re: How to start with SciPy and NumPy)
Mon Jan 26 04:22:23 CST 2009
A Monday 26 January 2009, Vicent escrigué:
> Hello again.
> I have a doubt, related with all what was talked in the last posts of
> the previous thread.
> I've managed to build a NumPy array whose elements (scalars?) are
> objects from a class. That class was defined by me previously.
> Each object contains several properties. For example, they have the
> property "value".
> For each object, the property "value" can contain many different
> things, for example, an integer value, a boolean value or a "float".
> SO, I think it wouldn't be possible to replace that "object"/class
> with a NumPy data-type or "struct", in case I wanted.
> My question: is that a problem? I mean, is that NumPy array going to
> be "slow" to search, and so on, because its elements are not
> "optimized" NumPy types?? Maybe this question has no sense, but,
> actually, I would like to know if there is any kind of problem with
> that kind of "mixed structure": using NumPy arrays of
> developer-defined (non-NumPy) objects.
Yes. In general, having arrays of 'object' dtype is a problem in NumPy
because you won't be able to reach the high performance that NumPy can
usually reach by specifying other dtypes like 'float' or 'int'. This
is because many of the NumPy accelerations are based on two facts:
1. That every element of the array is of equal size (in order to allow
high memory performance on common access patterns).
2. That operations between each of these elements have available
hardware that can perform fast operations with them.
In nowadays architectures, the sort of elements that satisfy those
conditions are mainly these types:
boolean, integer, float, complex and fixed-length strings
Another kind of array element that can benefit from NumPy better
computational abilities are compound objects that are made of the above
ones, which are commonly referred as 'record types'. However, in order
to preserve condition 1, these compound objects cannot vary in size
from element to element (so, your example does not fit here). However,
such record arrays normally lacks the property 2 for most operations,
so they are normally seen more as a data containers than a
computational object "per se".
So, you have two options here:
- If you want to stick with collections of classes with attributes that
can be general python objects, then try to use python containers for
your case. You will find that, in general, they are better suited for
doing most of your desired operations.
- If you need extreme computational speed, then you need to change your
data schema (and perhaps the way your brain works too) and start to
think in terms of homegeneous array NumPy objects as your building
This is why people wanted that you were more explicit in describing your
situation: they tried to see whether NumPy arrays could be used as the
basic building blocks for your data schema or not. My advice here is
that you try first with regular python containers. If you are not
satisfied with speed or memory consumption, then try to restate your
problem in terms of arrays and use NumPy to accelerate them (and to
consume far less memory too).
Hope that helps,
More information about the SciPy-user