[Numpy-discussion] Using ndarray for 2-dimensional, heterogeneous data

N. Volbers mithrandir42 at web.de
Thu Feb 9 22:02:02 CST 2006

```Hello everyone,

I am re-thinking the design of my evaluation software, but I am not
quite sure if I am doing the right decision, so let me state my problem:

I am writing a simple evaluation program to read scientific (ASCII) data
and plot it both via gnuplot and matplotlib. The data is typically very
simple: numbers arranged in columns. Before numpy I was using Numeric
arrays to store this data in a list of 1-dimensional arrays, e.g.:

a =  [ array([1,2,3,4]), array([2.3,17.2,19.1,22.2]) ]

This layout made it very easy to add, remove or rearrange columns,
because these were simple list operations. It also had the nice effect
to allow different data types for different columns. However, row access
was hard and I had to use my own iterator object to do so.

When I read about heterogeneous arrays in numpy I started a new
implementation which would store the same data as above like this:

b = numpy.array( [(1,2,3,4), (2.3,17.2,19.1,22.2)],
dtype={'names':['col1','col2'], 'formats': ['i2','f4']})

Row operations are much easier now, because I can use numpy's intrinsic
capabilities. However column operations require to create a new array
based on the old one.

Now I am wondering if the use of such an array has more drawbacks that I
am not aware of. E.g. is it possible to mask values in such an array?

And is it slower to get a certain column by using b['col1'] than it
would using a homogeneous array c and the notation c[:,0]?

Does anyone else use such a data layout and can report on problems with it?

Best regards,

Niklas Volbers.

```