[NumPy-Tickets] [NumPy] #2114: tofile() truncation on arrays >= 2**32 on 64-bit OSX
NumPy Trac
numpy-tickets@scipy....
Tue Apr 24 10:06:40 CDT 2012
#2114: tofile() truncation on arrays >= 2**32 on 64-bit OSX
------------------------+---------------------------------------------------
Reporter: embray | Owner: somebody
Type: defect | Status: new
Priority: normal | Milestone: Unscheduled
Component: numpy.core | Version: devel
Keywords: |
------------------------+---------------------------------------------------
After a fair bit of debugging we've tracked down a bug in OSX's fwrite()
(actually in an internal function that affects fwrite(), fprintf(), and
other functions that write to a file handle). This bug was originally
discovered by trying to write out some large arrays with Numpy. As far as
I can tell (from some Google searches) this bug isn't otherwise well known
yet.
The bug is that at some point the size passed to fwrite() is stuffed into
a 32-bit register and checks if it's a multiple of 0x1000 (4096) and then
branches off to some separate routine for doing writes that are a multiple
of one block size.
Thus, if the size is a multiple of 4096 and >= 2**32, the size gets
silently truncated to `size & 0xffffffff`.
The attached test program illustrates the problem. This has been tested
and been shown buggy on Leopard and Lion (and so presumably the bug exists
in Snow Leopard--not sure about earlier OSX versions).
This is what the output looks like:
{{{
$ gcc -g -Wall -arch x86_64 -Wextra writetest.c -o writetest
$ ./writetest 0x100000000 && ls -l test.array
size_t bytes: 8
array size: 4294967296
array size cast as size_t: 4294967296
wrote 4294967296 bytes
-rw-r--r-- 1 embray 31 0 Apr 24 11:03 test.array
}}}
As you can see, fwrite() even returns that it wrote "4294967296 bytes",
though in reality it wrote zero bytes. Likewise:
{{{
$ ./writetest 0x100001000 && ls -l test.array
size_t bytes: 8
array size: 4294971392
array size cast as size_t: 4294971392
wrote 4294971392 bytes
-rw-r--r-- 1 embray 31 4096 Apr 24 11:04 test.array
}}}
Further testing has shown that this holds for any multiple of 4096.
The fix that was implemented for #1660, where arrays are written in 2GB
chunks, would also solve this problem. So I think it would probably be
sufficient to just enable the same chunked write code block in
`PyArray_ToFile()` on OSX as well.
Although the OSX bug only occurs on those 4K boundaries and only for sizes
>= 2**32, for the sake of simplicity I think it's fine to just use more or
less the same workaround.
--
Ticket URL: <http://projects.scipy.org/numpy/ticket/2114>
NumPy <http://projects.scipy.org/numpy>
My example project
More information about the NumPy-Tickets
mailing list