[NumPy-Tickets] [NumPy] #2114: tofile() truncation on arrays >= 2**32 on 64-bit OSX

NumPy Trac numpy-tickets@scipy....
Tue Apr 24 10:06:40 CDT 2012

#2114: tofile() truncation on arrays >= 2**32 on 64-bit OSX
 Reporter:  embray      |       Owner:  somebody   
     Type:  defect      |      Status:  new        
 Priority:  normal      |   Milestone:  Unscheduled
Component:  numpy.core  |     Version:  devel      
 Keywords:              |  
 After a fair bit of debugging we've tracked down a bug in OSX's fwrite()
 (actually in an internal function that affects fwrite(), fprintf(), and
 other functions that write to a file handle).  This bug was originally
 discovered by trying to write out some large arrays with Numpy.  As far as
 I can tell (from some Google searches) this bug isn't otherwise well known

 The bug is that at some point the size passed to fwrite() is stuffed into
 a 32-bit register and checks if it's a multiple of 0x1000 (4096) and then
 branches off to some separate routine for doing writes that are a multiple
 of one block size.

 Thus, if the size is a multiple of 4096 and >= 2**32, the size gets
 silently truncated to `size & 0xffffffff`.

 The attached test program illustrates the problem.  This has been tested
 and been shown buggy on Leopard and Lion (and so presumably the bug exists
 in Snow Leopard--not sure about earlier OSX versions).

 This is what the output looks like:
 $ gcc -g -Wall -arch x86_64 -Wextra writetest.c -o writetest
 $ ./writetest 0x100000000 && ls -l test.array
 size_t bytes: 8
 array size: 4294967296
 array size cast as size_t: 4294967296
 wrote 4294967296 bytes
 -rw-r--r--  1 embray  31  0 Apr 24 11:03 test.array

 As you can see, fwrite() even returns that it wrote "4294967296 bytes",
 though in reality it wrote zero bytes.  Likewise:

 $ ./writetest 0x100001000 && ls -l test.array
 size_t bytes: 8
 array size: 4294971392
 array size cast as size_t: 4294971392
 wrote 4294971392 bytes
 -rw-r--r--  1 embray  31  4096 Apr 24 11:04 test.array

 Further testing has shown that this holds for any multiple of 4096.

 The fix that was implemented for #1660, where arrays are written in 2GB
 chunks, would also solve this problem.  So I think it would probably be
 sufficient to just enable the same chunked write code block in
 `PyArray_ToFile()` on OSX as well.

 Although the OSX bug only occurs on those 4K boundaries and only for sizes
 >= 2**32, for the sake of simplicity I think it's fine to just use more or
 less the same workaround.

Ticket URL: <http://projects.scipy.org/numpy/ticket/2114>
NumPy <http://projects.scipy.org/numpy>
My example project

More information about the NumPy-Tickets mailing list