[Numpy-discussion] Behavior from a change in dtype?

Christopher Barker Chris.Barker@noaa....
Tue Sep 8 11:53:16 CDT 2009


Skipper Seabold wrote:
> Hmm, okay, well I came across this in trying to create a recarray like
> data2 below, so I guess I should just combine the two questions.

key to understanding this is to understand what is going on under the 
hood in numpy. Travis O. gave a nice intro in an Enthought webcast a few 
months ago -- I"m not sure if those are recorded and up on the web, but 
it's worth a look. It was also discussed int eh advanced numpy tutorial 
at SciPy this year -- and that is up on the web:

http://www.archive.org/details/scipy09_advancedTutorialDay1_1


Anyway, here is my minimal attempt to clarify:

> import numpy as np
> 
> data = np.array([[10.75, 1, 1],[10.39, 0, 1],[18.18, 0, 1]])

here we are using a standard array constructor -- it will look at the 
data you are passing in (a mixture of python floats and ints), and 
decide that they can best be represented by a numpy array of float64s.

numpy arrays are essentially a pointer to a black of memory, and a bunch 
of attributes that describe how the bytes pointed to are to be 
interpreted. In this case, they are a 9 C doubles, representing a 3x3 
array of doubles.

> dt = np.dtype([('var1', '<f8'), ('var2', '<i8'), ('var3', '<i8')])

(NOTE: I'm on a big-endian machine, so I've used:
dt = np.dtype([('var1', '>f8'), ('var2', '>i8'), ('var3', '>i8')])
)

This is a data type descriptor that is analogous to a C struct, 
containing a float64 and two int84s

> # Doesn't work, raises TypeError: expected a readable buffer object
> data2 = data2.view(np.recarray)
> data2.astype(dt)

I'm don't understand that error either, but recarrays are about adding 
the ability to access parts of a structured array by name, but you still 
need the dtype to specify the types and names. This does seem to work 
(though may not be giving the results you expect):

In [19]: data2 = data.copy()
In [20]: data2 = data2.view(np.recarray)
In [21]: data2 = data2.view(dtype=dt)

or, indeed in the opposite order:

In [24]: data2 = data.copy()
In [25]: data2 = data2.view(dtype=dt)
In [26]: data2 = data2.view(np.recarray)


So you've done two operations, one is to change the dtype -- the 
interpretation of the bytes in the data buffer, and one is to make this 
a recarray, which allows you to access the "fields" by name:

In [31]: data2['var1']
Out[31]:
array([[ 10.75],
        [ 10.39],
        [ 18.18]])

> # Works without error (?) with unexpected result
> data3 = data3.view(np.recarray)
> data3.dtype = dt

that all depends what you expect! I used "view" above, 'cause I think 
there is less magic, though it's the same thing. I suppose changing the 
dtype in place like that is a tiny bit more efficient -- if you use 
.view() , you are creating a new array pointing to the same data, rather 
than changing the array in place.

But anyway, the dtype describes how the bytes in the memory black are to 
be interpreted, changing it by assigning the attribute or using .view() 
changes the interpretation, but does not change the bytes themselves at 
all, so in this case, you are taking the 8 bytes representing a float64 
of value: 1.0, and interpreting those bytes as an 8 byte int -- which is 
going to give you garbage, essentially.

> # One correct (though IMHO) unintuitive way
> data = np.rec.fromarrays(data.swapaxes(1,0), dtype=dt)

This is using the np.rec.fromarrays constructor to build a new record 
array with the dtype you want, the data is being converted and copied, 
it won't change the original at all:

So the question remains -- is there a way to convert the floats in 
"data" to ints in place?


This seems to work:
In [78]: data = np.array([[10.75, 1, 1],[10.39, 0, 1],[18.18, 0, 1]])

In [79]: data[:,1:3] = data[:,1:3].astype('>i8').view(dtype='>f8')

In [80]: data.dtype = dt

It is making a copy of the integer data in process -- but I think that 
is required, as you are changing the value, not just the interpretation 
of the bytes. I suppose we could have a "astype_inplace" method, but 
that would only work if the two types were the same size, and I'm not 
sure it's a common enough use to be worth it.

What is your real use case? I suspect that what you really should do 
here is define your dtype first, then create the array of data:

data = np.array([(10.75, 1, 1), (10.39, 0, 1), (18.18, 0, 1)], dtype=dt)

which does require that you use tuples, rather than lists to hold the 
"structs".

HTH,
  - Chris







-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker@noaa.gov


More information about the NumPy-Discussion mailing list