[SciPy-user] NumPy arrays of Python objects (it was Re: How to start with SciPy and NumPy)

Francesc Alted faltet@pytables....
Mon Jan 26 04:22:23 CST 2009


Ei, Vicent,

A Monday 26 January 2009, Vicent escrigué:
> Hello again.
>
> I have a doubt, related with all what was talked in the last posts of
> the previous thread.
>
> I've managed to build a NumPy array whose elements (scalars?) are
> objects from a class. That class was defined by me previously.
>
> Each object contains several properties. For example, they have the
> property "value".
>
> For each object, the property "value" can contain many different
> things, for example, an integer value, a boolean value or a "float".
> SO, I think it wouldn't be possible to replace that "object"/class
> with a NumPy data-type or "struct", in case I wanted.
>
> My question: is that a problem? I mean, is that NumPy array going to
> be "slow" to search, and so on, because its elements are not
> "optimized" NumPy types?? Maybe this question has no sense, but,
> actually, I would like to know if there is any kind of problem with
> that kind of "mixed structure": using NumPy arrays of
> developer-defined (non-NumPy) objects.

Yes.  In general, having arrays of 'object' dtype is a problem in NumPy 
because you won't be able to reach the high performance that NumPy can 
usually reach by specifying other dtypes like 'float' or 'int'.  This 
is because many of the NumPy accelerations are based on two facts:

1. That every element of the array is of equal size (in order to allow 
high memory performance on common access patterns).

2. That operations between each of these elements have available 
hardware that can perform fast operations with them.

In nowadays architectures, the sort of elements that satisfy those 
conditions are mainly these types:

boolean, integer, float, complex and fixed-length strings

Another kind of array element that can benefit from NumPy better 
computational abilities are compound objects that are made of the above 
ones, which are commonly referred as 'record types'.  However, in order 
to preserve condition 1, these compound objects cannot vary in size 
from element to element (so, your example does not fit here).  However, 
such record arrays normally lacks the property 2 for most operations, 
so they are normally seen more as a data containers than a 
computational object "per se".

So, you have two options here:

- If you want to stick with collections of classes with attributes that 
can be general python objects, then try to use python containers for 
your case.  You will find that, in general, they are better suited for 
doing most of your desired operations.

- If you need extreme computational speed, then you need to change your 
data schema (and perhaps the way your brain works too) and start to 
think in terms of homegeneous array NumPy objects as your building 
blocks.

This is why people wanted that you were more explicit in describing your 
situation: they tried to see whether NumPy arrays could be used as the 
basic building blocks for your data schema or not.  My advice here is 
that you try first with regular python containers.  If you are not 
satisfied with speed or memory consumption, then try to restate your 
problem in terms of arrays and use NumPy to accelerate them (and to 
consume far less memory too).

Hope that helps,

-- 
Francesc Alted


More information about the SciPy-user mailing list