[Numpy-discussion] Using ndarray for 2-dimensional, heterogeneous data

N. Volbers mithrandir42 at web.de
Thu Feb 9 22:02:02 CST 2006


Hello everyone,

I am re-thinking the design of my evaluation software, but I am not 
quite sure if I am doing the right decision, so let me state my problem:

I am writing a simple evaluation program to read scientific (ASCII) data 
and plot it both via gnuplot and matplotlib. The data is typically very 
simple: numbers arranged in columns. Before numpy I was using Numeric 
arrays to store this data in a list of 1-dimensional arrays, e.g.:

 a =  [ array([1,2,3,4]), array([2.3,17.2,19.1,22.2]) ]

This layout made it very easy to add, remove or rearrange columns, 
because these were simple list operations. It also had the nice effect 
to allow different data types for different columns. However, row access 
was hard and I had to use my own iterator object to do so.

When I read about heterogeneous arrays in numpy I started a new 
implementation which would store the same data as above like this:

 b = numpy.array( [(1,2,3,4), (2.3,17.2,19.1,22.2)], 
dtype={'names':['col1','col2'], 'formats': ['i2','f4']})

Row operations are much easier now, because I can use numpy's intrinsic 
capabilities. However column operations require to create a new array 
based on the old one.

Now I am wondering if the use of such an array has more drawbacks that I 
am not aware of. E.g. is it possible to mask values in such an array?

And is it slower to get a certain column by using b['col1'] than it 
would using a homogeneous array c and the notation c[:,0]?

Does anyone else use such a data layout and can report on problems with it?

Best regards,

Niklas Volbers.









More information about the Numpy-discussion mailing list