[Numpy-discussion] Behavior from a change in dtype?
Christopher Barker
Chris.Barker@noaa....
Tue Sep 8 11:53:16 CDT 2009
Skipper Seabold wrote:
> Hmm, okay, well I came across this in trying to create a recarray like
> data2 below, so I guess I should just combine the two questions.
key to understanding this is to understand what is going on under the
hood in numpy. Travis O. gave a nice intro in an Enthought webcast a few
months ago -- I"m not sure if those are recorded and up on the web, but
it's worth a look. It was also discussed int eh advanced numpy tutorial
at SciPy this year -- and that is up on the web:
http://www.archive.org/details/scipy09_advancedTutorialDay1_1
Anyway, here is my minimal attempt to clarify:
> import numpy as np
>
> data = np.array([[10.75, 1, 1],[10.39, 0, 1],[18.18, 0, 1]])
here we are using a standard array constructor -- it will look at the
data you are passing in (a mixture of python floats and ints), and
decide that they can best be represented by a numpy array of float64s.
numpy arrays are essentially a pointer to a black of memory, and a bunch
of attributes that describe how the bytes pointed to are to be
interpreted. In this case, they are a 9 C doubles, representing a 3x3
array of doubles.
> dt = np.dtype([('var1', '<f8'), ('var2', '<i8'), ('var3', '<i8')])
(NOTE: I'm on a big-endian machine, so I've used:
dt = np.dtype([('var1', '>f8'), ('var2', '>i8'), ('var3', '>i8')])
)
This is a data type descriptor that is analogous to a C struct,
containing a float64 and two int84s
> # Doesn't work, raises TypeError: expected a readable buffer object
> data2 = data2.view(np.recarray)
> data2.astype(dt)
I'm don't understand that error either, but recarrays are about adding
the ability to access parts of a structured array by name, but you still
need the dtype to specify the types and names. This does seem to work
(though may not be giving the results you expect):
In [19]: data2 = data.copy()
In [20]: data2 = data2.view(np.recarray)
In [21]: data2 = data2.view(dtype=dt)
or, indeed in the opposite order:
In [24]: data2 = data.copy()
In [25]: data2 = data2.view(dtype=dt)
In [26]: data2 = data2.view(np.recarray)
So you've done two operations, one is to change the dtype -- the
interpretation of the bytes in the data buffer, and one is to make this
a recarray, which allows you to access the "fields" by name:
In [31]: data2['var1']
Out[31]:
array([[ 10.75],
[ 10.39],
[ 18.18]])
> # Works without error (?) with unexpected result
> data3 = data3.view(np.recarray)
> data3.dtype = dt
that all depends what you expect! I used "view" above, 'cause I think
there is less magic, though it's the same thing. I suppose changing the
dtype in place like that is a tiny bit more efficient -- if you use
.view() , you are creating a new array pointing to the same data, rather
than changing the array in place.
But anyway, the dtype describes how the bytes in the memory black are to
be interpreted, changing it by assigning the attribute or using .view()
changes the interpretation, but does not change the bytes themselves at
all, so in this case, you are taking the 8 bytes representing a float64
of value: 1.0, and interpreting those bytes as an 8 byte int -- which is
going to give you garbage, essentially.
> # One correct (though IMHO) unintuitive way
> data = np.rec.fromarrays(data.swapaxes(1,0), dtype=dt)
This is using the np.rec.fromarrays constructor to build a new record
array with the dtype you want, the data is being converted and copied,
it won't change the original at all:
So the question remains -- is there a way to convert the floats in
"data" to ints in place?
This seems to work:
In [78]: data = np.array([[10.75, 1, 1],[10.39, 0, 1],[18.18, 0, 1]])
In [79]: data[:,1:3] = data[:,1:3].astype('>i8').view(dtype='>f8')
In [80]: data.dtype = dt
It is making a copy of the integer data in process -- but I think that
is required, as you are changing the value, not just the interpretation
of the bytes. I suppose we could have a "astype_inplace" method, but
that would only work if the two types were the same size, and I'm not
sure it's a common enough use to be worth it.
What is your real use case? I suspect that what you really should do
here is define your dtype first, then create the array of data:
data = np.array([(10.75, 1, 1), (10.39, 0, 1), (18.18, 0, 1)], dtype=dt)
which does require that you use tuples, rather than lists to hold the
"structs".
HTH,
- Chris
--
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov
More information about the NumPy-Discussion
mailing list