[Numpy-discussion] Re: weird interaction: pickle, numpy, matplotlib.hist

Robert Kern robert.kern at gmail.com
Wed Apr 5 09:11:01 CDT 2006


Andrew Jaffe wrote:
> Hi All,
> 
> I've encountered a strange problem: I've been running some python code
> on both a linux box and OS X, both with python 2.4.1 and the latest
> numpy and matplotlib from svn.
> 
> I have found that when I transfer pickled numpy arrays from one machine
> to the other (in either direction), the resulting data *looks* all right
> (i.e., it is a numpy array of the correct type with the correct values
> at the correct indices), but it seems to produce the wrong result in (at
> least) one circumstance: matplotlib.hist() gives the completely wrong
> picture (and set of bins).
> 
> This can be ameliorated by running the array through
>    arr=numpy.asarray(arr, dtype=numpy.float64)
> but this seems like a complete kludge (and is only needed when you do
> the transfer between machines).

You have a byteorder issue. You Linux box, which I presume has an Intel or AMD
CPU, is little-endian where your OS X box, which I presume has a PPC CPU, is
big-endian. numpy arrays can store their data in either endianness on either
kind of platform; their dtype objects tell you which byteorder they are using.

In the dtype specifications below, '>' means big-endian (I am using a PPC
PowerBook), and '<' means little-endian.


In [31]: a = linspace(0, 10, 11)

In [32]: a
Out[32]: array([  0.,   1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,  10.])

In [33]: a.dtype
Out[33]: dtype('>f8')

In [34]: b = a.newbyteorder()

In [35]: b
Out[35]:
array([  0.00000000e+000,   3.03865194e-319,   3.16202013e-322,
         1.04346664e-320,   2.05531309e-320,   2.56123631e-320,
         3.06715953e-320,   3.57308275e-320,   4.07900597e-320,
         4.33196758e-320,   4.58492919e-320])

In [36]: b.dtype
Out[36]: dtype('<f8')

In [41]: a.tostring()[-8:]
Out[41]: '@$\x00\x00\x00\x00\x00\x00'

In [42]: b.tostring()[-8:]
Out[42]: '@$\x00\x00\x00\x00\x00\x00'


Apparently, the pickle stores the data in the creator machine's byteorder and so
marked. When the reading machine loads the pickle, it recognizes that the
byteorder is opposite its native byteorder by its dtype.

Most operations work as you might expect:


In [44]: a.astype(dtype('<f8'))
Out[44]: array([  0.,   1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,  10.])

In [45]: c = _

In [46]: c.dtype
Out[46]: dtype('<f8')

In [47]: a + c
Out[47]: array([  0.,   2.,   4.,   6.,   8.,  10.,  12.,  14.,  16.,  18.,  20.])


Some don't:


In [54]: c.sort()

In [55]: c
Out[55]: array([  0.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,  10.,   1.])


This is a bug.

http://projects.scipy.org/scipy/numpy/ticket/47

-- 
Robert Kern
robert.kern at gmail.com

"I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth."
  -- Umberto Eco





More information about the Numpy-discussion mailing list