[Scipy-tickets] [SciPy] #1846: glibc double free error for huge datasets

SciPy Trac scipy-tickets@scipy....
Sat Feb 16 15:44:06 CST 2013


#1846: glibc double free error for huge datasets
---------------------+------------------------------------------------------
 Reporter:  avneesh  |       Owner:  somebody   
     Type:  defect   |      Status:  new        
 Priority:  high     |   Milestone:  Unscheduled
Component:  Other    |     Version:  0.9.0      
 Keywords:           |  
---------------------+------------------------------------------------------
 Hello SciPy team,

 I seem to have encountered some unusual behavior when running NumPy/SciPy
 on massive datasets.  Please see below for the error.  Also, this
 ticket/bug report has been cross-posted on stackoverflow.com and the NumPy
 developers list on github as well.

 I have a very simple script in Python, but for some reason I get the
 following error when running a large amount of data:

 {{{
 *** glibc detected *** python: double free or corruption (out):
 0x00002af5a00cc010 ***
 }}}

 I am used to these errors coming up in C or C++, when one tries to free
 memory that has already been freed. However, by my understanding of Python
 (and especially the way I've written the code), I really don't understand
 why this should happen.

 Here is the code:


 {{{
 #!/usr/bin/python -tt

 import sys, commands, string
 import numpy as np
 import scipy.io as io
 from time import clock

 W = io.loadmat(sys.argv[1])['W']
 size = W.shape[0]
 numlabels = int(sys.argv[2])
 Q = np.zeros((size, numlabels), dtype=np.double)
 P = np.zeros((size, numlabels), dtype=np.double)
 Q += 1.0 / Q.shape[1]
 nu = 0.001
 mu = 0.01
 start = clock()
 mat = -nu + mu*(W*(np.log(Q)-1))
 end = clock()
 print >> sys.stderr, "Time taken to compute matrix: %.2f seconds"%(end-
 start)
 }}}

 One may ask, why declare a P and a Q numpy array? I simply do that to
 reflect the actual conditions (as this code is simply a segment of what I
 actually do, where I need a P matrix and declare it beforehand).

 I have access to a 192GB machine, and so I tested this out on a very large
 SciPy sparse matrix (2.2 million by 2.2 million, but very sparse, that's
 not the issue). The main memory is taken up by the Q, P, and mat matrices,
 as they are all 2.2 million by 2000 matrices (size = 2.2 million,
 numlabels = 2000). The peak memory goes up to 131GB, which comfortably
 fits in memory. While the mat matrix is being computed, I get the glibc
 error, and my process automatically goes into the sleep (S) state, without
 deallocating the 131GB it has taken up.

 Given the bizarre (for Python) error (I am not explicitly deallocating
 anything), and the fact that this works nicely for smaller matrix sizes
 (around 1.5 million by 2000), I am really not sure where to start to debug
 this.

 As a starting point, I have set "ulimit -s unlimited" before running, but
 to no avail.

 Any help or insight into numpy's behavior with really large amounts of
 data would be welcome.

 Note that this is NOT an out of memory error - I have 196GB, and my
 process reaches around 131GB and stays there for some time before giving
 the error above.

 As per suggestions, I ran Python with GDB. Interestingly, on one GDB run I
 forgot to set the stack size limit to "unlimited", and got the following
 output:


 {{{
 *** glibc detected *** /usr/bin/python: munmap_chunk(): invalid pointer:
 0x00007fe7508a9010 ***
 ======= Backtrace: =========
 /lib64/libc.so.6(+0x733b6)[0x7ffff6ec23b6]
 /usr/lib64/python2.7/site-
 packages/numpy/core/multiarray.so(+0x4a496)[0x7ffff69fc496]
 /usr/lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x4e67)[0x7ffff7af48c7]
 /usr/lib64/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x309)[0x7ffff7af6c49]
 /usr/lib64/libpython2.7.so.1.0(PyEval_EvalCode+0x32)[0x7ffff7b25592]
 /usr/lib64/libpython2.7.so.1.0(+0xfcc61)[0x7ffff7b33c61]
 /usr/lib64/libpython2.7.so.1.0(PyRun_FileExFlags+0x84)[0x7ffff7b34074]
 /usr/lib64/libpython2.7.so.1.0(PyRun_SimpleFileExFlags+0x189)[0x7ffff7b347c9]
 /usr/lib64/libpython2.7.so.1.0(Py_Main+0x36c)[0x7ffff7b3e1bc]
 /lib64/libc.so.6(__libc_start_main+0xfd)[0x7ffff6e6dbfd]
 /usr/bin/python[0x4006e9]
 ======= Memory map: ========
 00400000-00401000 r-xp 00000000 09:01 50336181
 /usr/bin/python2.7
 00600000-00601000 r--p 00000000 09:01 50336181
 /usr/bin/python2.7
 00601000-00602000 rw-p 00001000 09:01 50336181
 /usr/bin/python2.7
 00602000-00e5f000 rw-p 00000000 00:00 0
 [heap]
 7fdf2584c000-7ffff0a66000 rw-p 00000000 00:00 0
 7ffff0a66000-7ffff0a6b000 r-xp 00000000 09:01 50333916
 /usr/lib64/python2.7/lib-dynload/mmap.so
 7ffff0a6b000-7ffff0c6a000 ---p 00005000 09:01 50333916
 /usr/lib64/python2.7/lib-dynload/mmap.so
 7ffff0c6a000-7ffff0c6b000 r--p 00004000 09:01 50333916
 /usr/lib64/python2.7/lib-dynload/mmap.so
 7ffff0c6b000-7ffff0c6c000 rw-p 00005000 09:01 50333916
 /usr/lib64/python2.7/lib-dynload/mmap.so
 7ffff0c6c000-7ffff0c77000 r-xp 00000000 00:12 54138483
 /home/avneesh/.local/lib/python2.7/site-
 packages/scipy/io/matlab/streams.so
 7ffff0c77000-7ffff0e76000 ---p 0000b000 00:12 54138483
 /home/avneesh/.local/lib/python2.7/site-
 packages/scipy/io/matlab/streams.so
 7ffff0e76000-7ffff0e77000 r--p 0000a000 00:12 54138483
 /home/avneesh/.local/lib/python2.7/site-
 packages/scipy/io/matlab/streams.so
 7ffff0e77000-7ffff0e78000 rw-p 0000b000 00:12 54138483
 /home/avneesh/.local/lib/python2.7/site-
 packages/scipy/io/matlab/streams.so
 7ffff0e78000-7ffff0e79000 rw-p 00000000 00:00 0
 7ffff0e79000-7ffff0e9b000 r-xp 00000000 00:12 54138481
 /home/avneesh/.local/lib/python2.7/site-
 packages/scipy/io/matlab/mio5_utils.so
 7ffff0e9b000-7ffff109a000 ---p 00022000 00:12 54138481
 /home/avneesh/.local/lib/python2.7/site-
 packages/scipy/io/matlab/mio5_utils.so
 7ffff109a000-7ffff109b000 r--p 00021000 00:12 54138481
 /home/avneesh/.local/lib/python2.7/site-
 packages/scipy/io/matlab/mio5_utils.so
 7ffff109b000-7ffff109f000 rw-p 00022000 00:12 54138481
 /home/avneesh/.local/lib/python2.7/site-
 packages/scipy/io/matlab/mio5_utils.so
 7ffff109f000-7ffff10a0000 rw-p 00000000 00:00 0
 7ffff10a0000-7ffff10a5000 r-xp 00000000 09:01 50333895
 /usr/lib64/python2.7/lib-dynload/zlib.so
 7ffff10a5000-7ffff12a4000 ---p 00005000 09:01 50333895
 /usr/lib64/python2.7/lib-dynload/zlib.so
 7ffff12a4000-7ffff12a5000 r--p 00004000 09:01 50333895
 /usr/lib64/python2.7/lib-dynload/zlib.so
 7ffff12a5000-7ffff12a7000 rw-p 00005000 09:01 50333895
 /usr/lib64/python2.7/lib-dynload/zlib.so
 7ffff12a7000-7ffff12ad000 r-xp 00000000 00:12 54138491
 /home/avneesh/.local/lib/python2.7/site-
 packages/scipy/io/matlab/mio_utils.so
 7ffff12ad000-7ffff14ac000 ---p 00006000 00:12 54138491
 /home/avneesh/.local/lib/python2.7/site-
 packages/scipy/io/matlab/mio_utils.so
 7ffff14ac000-7ffff14ad000 r--p 00005000 00:12 54138491
 /home/avneesh/.local/lib/python2.7/site-
 packages/scipy/io/matlab/mio_utils.so
 7ffff14ad000-7ffff14ae000 rw-p 00006000 00:12 54138491
 /home/avneesh/.local/lib/python2.7/site-
 packages/scipy/io/matlab/mio_utils.so
 7ffff14ae000-7ffff14b5000 r-xp 00000000 00:12 54138562
 /home/avneesh/.local/lib/python2.7/site-
 packages/scipy/sparse/sparsetools/_csgraph.so
 7ffff14b5000-7ffff16b4000 ---p 00007000 00:12 54138562
 /home/avneesh/.local/lib/python2.7/site-
 packages/scipy/sparse/sparsetools/_csgraph.so
 7ffff16b4000-7ffff16b5000 r--p 00006000 00:12 54138562
 /home/avneesh/.local/lib/python2.7/site-
 packages/scipy/sparse/sparsetools/_csgraph.so
 7ffff16b5000-7ffff16b6000 rw-p 00007000 00:12 54138562
 /home/avneesh/.local/lib/python2.7/site-
 packages/scipy/sparse/sparsetools/_csgraph.so
 7ffff16b6000-7ffff17c2000 r-xp 00000000 00:12 54138558
 /home/avneesh/.local/lib/python2.7/site-
 packages/scipy/sparse/sparsetools/_bsr.so
 7ffff17c2000-7ffff19c2000 ---p 0010c000 00:12 54138558
 /home/avneesh/.local/lib/python2.7/site-
 packages/scipy/sparse/sparsetools/_bsr.so
 7ffff19c2000-7ffff19c3000 r--p 0010c000 00:12 54138558
 /home/avneesh/.local/lib/python2.7/site-
 packages/scipy/sparse/sparsetools/_bsr.so
 7ffff19c3000-7ffff19c6000 rw-p 0010d000 00:12 54138558
 /home/avneesh/.local/lib/python2.7/site-
 packages/scipy/sparse/sparsetools/_bsr.so
 7ffff19c6000-7ffff19d5000 r-xp 00000000 00:12 54138561
 /home/avneesh/.local/lib/python2.7/site-
 packages/scipy/sparse/sparsetools/_dia.so
 7ffff19d5000-7ffff1bd4000 ---p 0000f000 00:12 54138561
 /home/avneesh/.local/lib/python2.7/site-
 packages/scipy/sparse/sparsetools/_dia.so
 7ffff1bd4000-7ffff1bd5000 r--p 0000e000 00:12 54138561
 /home/avneesh/.local/lib/python2.7/site-
 packages/scipy/sparse/sparsetools/_dia.so
 Program received signal SIGABRT, Aborted.
 0x00007ffff6e81ab5 in raise () from /lib64/libc.so.6
 (gdb) bt
 #0  0x00007ffff6e81ab5 in raise () from /lib64/libc.so.6
 #1  0x00007ffff6e82fb6 in abort () from /lib64/libc.so.6
 #2  0x00007ffff6ebcdd3 in __libc_message () from /lib64/libc.so.6
 #3  0x00007ffff6ec23b6 in malloc_printerr () from /lib64/libc.so.6
 #4  0x00007ffff69fc496 in ?? () from /usr/lib64/python2.7/site-
 packages/numpy/core/multiarray.so
 #5  0x00007ffff7af48c7 in PyEval_EvalFrameEx () from
 /usr/lib64/libpython2.7.so.1.0
 #6  0x00007ffff7af6c49 in PyEval_EvalCodeEx () from
 /usr/lib64/libpython2.7.so.1.0
 #7  0x00007ffff7b25592 in PyEval_EvalCode () from
 /usr/lib64/libpython2.7.so.1.0
 #8  0x00007ffff7b33c61 in ?? () from /usr/lib64/libpython2.7.so.1.0
 #9  0x00007ffff7b34074 in PyRun_FileExFlags () from
 /usr/lib64/libpython2.7.so.1.0
 #10 0x00007ffff7b347c9 in PyRun_SimpleFileExFlags () from
 /usr/lib64/libpython2.7.so.1.0
 #11 0x00007ffff7b3e1bc in Py_Main () from /usr/lib64/libpython2.7.so.1.0
 #12 0x00007ffff6e6dbfd in __libc_start_main () from /lib64/libc.so.6
 #13 0x00000000004006e9 in _start ()
 }}}

 When I set the stack size limit to unlimited, I get the following:


 {{{
 *** glibc detected *** /usr/bin/python: double free or corruption (out):
 0x00002abb2732c010 ***
 ^X^C
 Program received signal SIGINT, Interrupt.
 0x00002aaaab9d08fe in __lll_lock_wait_private () from /lib64/libc.so.6
 (gdb) bt
 #0  0x00002aaaab9d08fe in __lll_lock_wait_private () from /lib64/libc.so.6
 #1  0x00002aaaab969f2e in _L_lock_9927 () from /lib64/libc.so.6
 #2  0x00002aaaab9682d1 in free () from /lib64/libc.so.6
 #3  0x00002aaaaaabbfe2 in _dl_scope_free () from /lib64/ld-
 linux-x86-64.so.2
 #4  0x00002aaaaaab70a4 in _dl_map_object_deps () from /lib64/ld-
 linux-x86-64.so.2
 #5  0x00002aaaaaabcaa0 in dl_open_worker () from /lib64/ld-
 linux-x86-64.so.2
 #6  0x00002aaaaaab85f6 in _dl_catch_error () from /lib64/ld-
 linux-x86-64.so.2
 #7  0x00002aaaaaabc5da in _dl_open () from /lib64/ld-linux-x86-64.so.2
 #8  0x00002aaaab9fb530 in do_dlopen () from /lib64/libc.so.6
 #9  0x00002aaaaaab85f6 in _dl_catch_error () from /lib64/ld-
 linux-x86-64.so.2
 #10 0x00002aaaab9fb5cf in dlerror_run () from /lib64/libc.so.6
 #11 0x00002aaaab9fb637 in __libc_dlopen_mode () from /lib64/libc.so.6
 #12 0x00002aaaab9d60c5 in init () from /lib64/libc.so.6
 #13 0x00002aaaab080933 in pthread_once () from /lib64/libpthread.so.0
 #14 0x00002aaaab9d61bc in backtrace () from /lib64/libc.so.6
 #15 0x00002aaaab95dde7 in __libc_message () from /lib64/libc.so.6
 #16 0x00002aaaab9633b6 in malloc_printerr () from /lib64/libc.so.6
 #17 0x00002aaaab9682dc in free () from /lib64/libc.so.6
 #18 0x00002aaaabef1496 in ?? () from /usr/lib64/python2.7/site-
 packages/numpy/core/multiarray.so
 #19 0x00002aaaaad888c7 in PyEval_EvalFrameEx () from
 /usr/lib64/libpython2.7.so.1.0
 #20 0x00002aaaaad8ac49 in PyEval_EvalCodeEx () from
 /usr/lib64/libpython2.7.so.1.0
 #21 0x00002aaaaadb9592 in PyEval_EvalCode () from
 /usr/lib64/libpython2.7.so.1.0
 #22 0x00002aaaaadc7c61 in ?? () from /usr/lib64/libpython2.7.so.1.0
 #23 0x00002aaaaadc8074 in PyRun_FileExFlags () from
 /usr/lib64/libpython2.7.so.1.0
 #24 0x00002aaaaadc87c9 in PyRun_SimpleFileExFlags () from
 /usr/lib64/libpython2.7.so.1.0
 #25 0x00002aaaaadd21bc in Py_Main () from /usr/lib64/libpython2.7.so.1.0
 #26 0x00002aaaab90ebfd in __libc_start_main () from /lib64/libc.so.6
 #27 0x00000000004006e9 in _start ()
 }}}

 This makes me believe the basic issue is with the numpy multiarray core
 module (line #4 in the first output and line #18 in the second). I will
 bring it up as a bug report in both numpy and scipy just in case.

 Has anyone seen this before?

-- 
Ticket URL: <http://projects.scipy.org/scipy/ticket/1846>
SciPy <http://www.scipy.org>
SciPy is open-source software for mathematics, science, and engineering.


More information about the Scipy-tickets mailing list