[Numpy-discussion] load of custom .npy file fails with numpy 2.0.0

Geoffrey Irving irving@naml...
Thu Aug 2 16:41:59 CDT 2012


On Thu, Aug 2, 2012 at 1:26 PM, Robert Kern <robert.kern@gmail.com> wrote:
> On Thu, Aug 2, 2012 at 8:46 PM, Geoffrey Irving <irving@naml.us> wrote:
>> Hello,
>>
>> The attached .npy file was written from custom C++ code.  It loads
>> fine in Numpy 1.6.2 with Python 2.6 installed through MacPorts, but
>> fails on a different machine with Numpy 2.0.0 installed via Superpack:
>>
>> box:array% which python
>> /usr/bin/python
>> box:array% which python
>> box:array% python
>> Python 2.6.1 (r261:67515, Aug  2 2010, 20:10:18)
>> [GCC 4.2.1 (Apple Inc. build 5646)] on darwin
>> Type "help", "copyright", "credits" or "license" for more information.
>>>>> import numpy
>>>>> numpy.load('blah.npy')
>> Traceback (most recent call last):
>>   File "<stdin>", line 1, in <module>
>>   File "/Library/Python/2.6/site-packages/numpy-2.0.0.dev_b5cdaee_20110710-py2.6-macosx-10.6-universal.egg/numpy/lib/npyio.py",
>> line 351, in load
>>     return format.read_array(fid)
>>   File "/Library/Python/2.6/site-packages/numpy-2.0.0.dev_b5cdaee_20110710-py2.6-macosx-10.6-universal.egg/numpy/lib/format.py",
>> line 440, in read_array
>>     shape, fortran_order, dtype = read_array_header_1_0(fp)
>>   File "/Library/Python/2.6/site-packages/numpy-2.0.0.dev_b5cdaee_20110710-py2.6-macosx-10.6-universal.egg/numpy/lib/format.py",
>> line 361, in read_array_header_1_0
>>     raise ValueError(msg % (d['descr'],))
>> ValueError: descr is not a valid dtype descriptor: 'd8'
>>>>> numpy.__version__
>> '2.0.0.dev-b5cdaee'
>>>>> numpy.__file__
>> '/Library/Python/2.6/site-packages/numpy-2.0.0.dev_b5cdaee_20110710-py2.6-macosx-10.6-universal.egg/numpy/__init__.pyc'
>>
>> It seems Numpy 2.0.0 no longer accepts dtype('d8'):
>>
>>>>> dtype('d8')
>> Traceback (most recent call last):
>>   File "<stdin>", line 1, in <module>
>> TypeError: data type "d8" not understood
>>
>> Was that intentional?  An API change isn't too much of a problem, but
>> it's unfortunate if old data files are no longer easily readable.
>
> As far as I can tell, numpy has never described an array using 'd8'.
> That would be a really old compatibility typecode from Numeric, if I
> remember correctly. The intention of the NPY format standard was that
> it would accept what numpy spits out for the descr, not that it would
> accept absolutely anything that numpy.dtype() can consume, even
> deprecated aliases (though I will admit that that is almost what the
> NEP says). In particular, endianness really should be included or else
> your files will be misread on big-endian machines.
>
> My suspicion is that only your code has ever made .npy files with this
> descr. I feel your pain, Geoff, and I apologize that my lax
> specification led you down this path, but I think you need to fix your
> code anyways.

Sounds good.  Both 1.6.2 and 2.0.0 write out '<f8' for the dtype.
I'll certainly add the '<' bit to signify endianness, but how should I
go about determining the letter?  My current code looks like

  // Get dtype info
  int bits;char letter;
  switch(type_num){
      #define CASE(T) case
NPY_##T:bits=NPY_BITSOF_##T;letter=NPY_##T##LTR;break;
      #define NPY_BITSOF_BYTE 8
      #define NPY_BITSOF_UBYTE 8
      #define NPY_BITSOF_USHORT NPY_BITSOF_SHORT
      #define NPY_BITSOF_UINT NPY_BITSOF_INT
      #define NPY_BITSOF_ULONG NPY_BITSOF_LONG
      #define NPY_BITSOF_ULONGLONG NPY_BITSOF_LONGLONG
      CASE(BOOL)
      CASE(BYTE)
      CASE(UBYTE)
      CASE(SHORT)
      CASE(USHORT)
      CASE(INT)
      CASE(UINT)
      CASE(LONG)
      CASE(ULONG)
      CASE(LONGLONG)
      CASE(ULONGLONG)
      CASE(FLOAT)
      CASE(DOUBLE)
      CASE(LONGDOUBLE)
      #undef CASE
      default: throw ValueError("Unknown dtype");}
  int bytes = bits/8;
  ...
  len += sprintf(base+len,"{'descr': '%c%d', 'fortran_order': False,
'shape': (",letter,bytes);

The code incorrectly assumes that the ...LTR constants are safe ways
to describe dtypes.  Is there a clean, correct way to do this that
doesn't require special casing for each type?  I can use numpy headers
but can't call any numpy functions, since Python might not be
initialized (e.g., if I'm writing out files through MPI IO collectives
on a Cray).

Thanks,
Geoffrey


More information about the NumPy-Discussion mailing list