[Numpy-discussion] Can I add rows and columns to recarray?

Mon Dec 6 12:26:59 CST 2010

On 12/5/10 7:56 PM, Wai Yip Tung wrote:
> I'm fairly new to numpy and I'm trying to figure out the right way to do
> things. Continuing on my question about using recarray as a relation.

note that recarrays (or structured arrays, AFAIK, the difference is 
atturube access only -- I don't use recarrays) are far more static than 
a database table. So you may really want to use a database, or maybe 
pytables. Or maybe even just stick with lists.

But if you are keeping things in memory, should be able to do what you want.

> In [339]: arr = np.array([
>      .....:     (1, 2.2, 0.0),
>      .....:     (3, 4.5, 0.0)
>      .....:     ],
>      .....:     dtype=[
>      .....:         ('unit',int),
>      .....:         ('price',float),
>      .....:         ('amount',float),
>      .....:     ]
>      .....: )
> In [340]: data = arr.view(recarray)
> One of the most common thing I want to do is to append rows to data.

numpy arrays do not naturally support appending, as you have discovered.

>  I
> think concatenate() might be the method.


> But I get a problem:

> In [342]: np.concatenate((data0,[1,9.0,9.0]))
> ---------------------------------------------------------------------------
> TypeError                                 Traceback (most recent call last)
> c:\Python26\Lib\site-packages\numpy\<ipython console>  in<module>()
> TypeError: expected a readable buffer object

concatenate expects two arrays to be joined. If you pass in something 
that can easily be turned into an array, it will work, but a tuple can 
be converted to multiple types of arrays, so it doesn't know what to do. 
So you need to re-construct the second array:

a2 = np.array( [(3,5.5, 3)], dtype=dt)
arr = np.concatenate( (arr, a2) )

> In [343]: data.amount = data.unit * data.price


> But sometimes it may require me to add a new column not already exist,
> e.g.:
> In [344]: data.discount_price = data.price * 0.9
> How can I add a new column?

you can't. what you need to do is create a new array with a new dtype 
that includes the new field.

The trick is that numpy only supports homogenous arrays -- evey item is 
the same data type. So when you could a strut array like above, numpy 
does not define it as a 2-d table, but rather, a 1-d array, each element 
of which is a structure.

so you need to do something like:

# create a new array
data2 = np.zeros(len(data), dtype=dt2)

# fill the array:
for field_name in dt.fields.keys():
     data2[field_name] = data[field_name]

# now some calculations:
data2['discount_price'] = data2['price'] * 0.9

I don't know of a way to avoid that loop when filling the array.

Better yet -- anticipate your needs and create the array with all the 
fields you need in the first place.

You can see that ndarrays are pretty static -- struct arrays can be 
useful data storage, but are not very suitable when things are changing 

You could write a class that wraps an andarray, and supports what you 
need better -- it could be a pretty usefull general purpose class, too. 
I've got one that handle the appending part, but nothing with adding new 

Here's appending with my class:

data3 = accumulator.accumulator(dtype = dt2)
data3.append((1, 2.2, 0.0, 0.0))
data3.append((3, 4.5, 0.0, 0.0))
data3.append((2, 1.2, 0.0, 0.0))
data3.append((5, 4.2, 0.0, 0.0))
print repr(data3)

# convert to regular array for calculations:
data3 = np.array(data3)

# now some calculations:
data3['discount_price'] = data3['price'] * 0.9

You wouldn't have to convert to a regular array, except that I haven't 
written the code to support field access yet -- I don't think it would 
be too hard, though.

I've enclosed some test code, and my accumulator class, in case you find 
it useful.


