[Numpy-discussion] complex numpy.ndarray dtypes

Francesc Alted faltet@pytables....
Thu Oct 2 03:28:23 CDT 2008


A Thursday 02 October 2008, John Gu escrigué:
> Hello,
>
> I am using numpy in conjunction with pyTables.  The data that I read
> in from pyTables seem to have the following dtype:
>
> p = hdf5.root.myTable.read()
>
> p.__class__
> <type 'numpy.ndarray'>
>
> p[0].__class__
> <type 'numpy.void'>
>
> p.dtype
> dtype([('time', '<f4'), ('obs1', '<f4'), ('obs2', '<f8'), ('obs3',
> '<f4')])
>
> p.shape
> (61230,)
>
> The manner in which I access a particular column is p['time'] or
> p['obs1']. I have a couple of questions regarding this data
> structure: 1) how do I restructure the array into a 61230 x 4 array
> that can be indexed using [r,c] notation?

In your example, the table (record array in NumPy jargon) is 
inhomogeneous (all fields are 'f4' except 'obs2' which is 'f8').  In 
that case, you can obtain an homogeneous array by doing something like:

In [44]: a = numpy.array([(1,2),(3,4)], dtype=[('obs1','<f4'),
('obs2','<f8')])

In [45]: b = numpy.array([(val['obs1'], val['obs2']) for val in a], 
dtype='<f4')

In [46]: b
Out[46]:
array([[ 1.,  2.],
       [ 3.,  4.]], dtype=float32)

In case your table would be homegenous, there is a simpler way:

In [41]: a = numpy.array([(1,2),(3,4)], dtype=[('obs1','<f4'),
('obs2','<f4')])

In [42]: d = a.view(('<f4',2))

In [43]: d
Out[43]:
array([[ 1.,  2.],
       [ 3.,  4.]], dtype=float32)

which is faster:

In [68]: timeit d = a.view(('<f4',2))
100000 loops, best of 3: 11.5 µs per loop

In [69]: timeit b=numpy.array([(val['obs1'], val['obs2']) for val in a], 
dtype='<f4')
10000 loops, best of 3: 39.8 µs per loop

> 2) What kind of dtype is 
> pyTables using?  How do I create a similar array that can be indexed
> by a named column?  I tried various ways:
>
> a = array([[1,2],[3,4]],
> dtype=dtype([('obs1','<f4'),('obs2','<f4')]))
> ---------------------------------------------------------------------
>------ <type 'exceptions.TypeError'>             Traceback (most
> recent call last)
>
> p:\AsiaDesk\johngu\projects\deltaForce\<ipython console> in
> <module>()
>
> <type 'exceptions.TypeError'>: expected a readable buffer object

Yeah, the error message is too terse in this case.  Record array 
constructor needs to be sure where your records start and end, and this 
is achieved by mapping tuples to records.  So, your example must be 
rewritten as:

In [70]: a = numpy.array([(1,2),(3,4)], dtype=[('obs1','<f4'),
('obs2','<f4')])

In [71]: a
Out[71]:
array([(1.0, 2.0), (3.0, 4.0)],
      dtype=[('obs1', '<f4'), ('obs2', '<f4')])

Have a look at:

http://www.scipy.org/RecordArrays

for more info on record arrays.

> I did find some documentation about array type descriptors when
> reading from files... it seems like these array types are specific to
> arrays created when reading from some sort of file / buffer?  Any
> help is appreciated.  Thanks!

I'm not sure on what you are asking here.  At any rate, it might be 
useful to have a look at complex dtype examples in:

http://www.scipy.org/Numpy_Example_List#head-f9175c69cccd74b9e4ee92e2a060af27c7447b76

Hope that helps,

-- 
Francesc Alted


More information about the Numpy-discussion mailing list