[NumPy-Tickets] [NumPy] #2091: Problem on OSX 10.7 with lapack_lite functions and multiprocessing

NumPy Trac numpy-tickets@scipy....
Mon Mar 26 16:34:44 CDT 2012


#2091: Problem on OSX 10.7 with lapack_lite functions and multiprocessing
--------------------------+-------------------------------------------------
 Reporter:  dougal        |       Owner:  pv         
     Type:  defect        |      Status:  new        
 Priority:  normal        |   Milestone:  Unscheduled
Component:  numpy.linalg  |     Version:  1.6.1      
 Keywords:                |  
--------------------------+-------------------------------------------------
 The following code segfaults for me on OSX 10.7.3.

 {{{
 from __future__ import print_function

 import numpy as np
 import multiprocessing as mp
 import scipy.linalg

 def f(a):
     print("about to call")

     ### these all cause crashes
     sign, x = np.linalg.slogdet(a)
     #x = np.linalg.det(a)
     #x = np.linalg.inv(a).sum()

     ### these are all fine
     #x = scipy.linalg.expm3(a).sum()
     #x = np.dot(a, a.T).sum()

     print("result:", x)
     return x

 def call_proc(a):
     print("\ncalling with multiprocessing")
     p = mp.Process(target=f, args=(a,))
     p.start()
     p.join()


 if __name__ == '__main__':
     import sys
     n = int(sys.argv[1]) if len(sys.argv) > 1 else 50

     a = np.random.normal(0, 2, (n, n))
     f(a)

     call_proc(a)
 }}}

 This code causes a segfault (trying to access e.g. `0x0000000000000108`);
 when I do a core dump, the backtrace is:

 {{{
 #0  0x00007fff8832c324 in dispatch_group_async_f ()
 #1  0x00007fff8b2fed3e in dgetrf_ ()
 #2  0x000000010ac0c26a in initlapack_lite ()
 #3  0x000000010a5a8d77 in PyEval_EvalFrameEx ()
 #4  0x000000010a5abdf7 in PyEval_EvalCode ()
 #5  0x000000010a5a8e0a in PyEval_EvalFrameEx ()
 #6  0x000000010a5abcd8 in PyEval_EvalCodeEx ()
 #7  0x000000010a549abf in PyClassMethod_New ()
 #8  0x000000010a528d32 in PyObject_Call ()
 #9  0x000000010a5a95ec in PyEval_EvalFrameEx ()
 #10 0x000000010a5abdf7 in PyEval_EvalCode ()
 #11 0x000000010a5a8e0a in PyEval_EvalFrameEx ()
 #12 0x000000010a5abdf7 in PyEval_EvalCode ()
 #13 0x000000010a5a8e0a in PyEval_EvalFrameEx ()
 #14 0x000000010a5abcd8 in PyEval_EvalCodeEx ()
 #15 0x000000010a549abf in PyClassMethod_New ()
 #16 0x000000010a528d32 in PyObject_Call ()
 #17 0x000000010a5376e9 in PyInstance_New ()
 #18 0x000000010a528d32 in PyObject_Call ()
 #19 0x000000010a573484 in _PyObject_SlotCompare ()
 #20 0x000000010a56db7a in PyType_Modified ()
 #21 0x000000010a528d32 in PyObject_Call ()
 #22 0x000000010a5a8f63 in PyEval_EvalFrameEx ()
 #23 0x000000010a5abdf7 in PyEval_EvalCode ()
 #24 0x000000010a5a8e0a in PyEval_EvalFrameEx ()
 #25 0x000000010a5abcd8 in PyEval_EvalCodeEx ()
 #26 0x000000010a5abe6c in PyEval_EvalCode ()
 #27 0x000000010a5a8e0a in PyEval_EvalFrameEx ()
 #28 0x000000010a5abcd8 in PyEval_EvalCodeEx ()
 #29 0x000000010a5abd4d in PyEval_EvalCode ()
 #30 0x000000010a5c308f in Py_CompileString ()
 #31 0x000000010a5c314f in PyRun_FileExFlags ()
 #32 0x000000010a5c42a2 in PyRun_SimpleFileExFlags ()
 #33 0x000000010a5d42af in Py_Main ()
 #34 0x000000010a519e88 in ?? ()
 }}}

 So it seems like the Apple Grand Central Dispatch stuff isn't playing nice
 with multiprocessing in this case.

 If it's helpful, the full OSX "problem report" is
 [https://gist.github.com/2209271 here].

 This happens when `f()` calls any of the `numpy.linalg` functions there,
 but not for the matrix multiplication or the `scipy.linalg.expm3` call
 (which doesn't call anything from lapack_lite; if I use one of the other
 `scipy.linalg` functions that do call e.g. `solve`, then it also
 segfaults).

 If I comment out the call to `f` from the main process, it runs fine.

 This happens for `n >= 33`; for `n <= 32`, it's fine. Note that
 32*32=1024, which seems like a reasonable cutoff point for when Accelerate
 would start parallelizing.

 Of course, it doesn't matter if the original `f` call is on the same
 matrix (it's being pickled, after all) -- as long as it's also big. Calls
 with one big, one small matrix or two small matrices (defining "small" as
 "n x n for n <= 32") run fine. It can also be a different function, e.g.
 first `np.linalg.inv` and then call out to another process for
 `np.linalg.slogdet`.


 This happens for me on my OSX 10.7.3 desktop with

  * python 2.6.7, numpy 1.5.1
  * python 2.7.1, numpy 1.5.1, scipy 0.10.0
  * python 3.2.2, numpy 1.6.1, scipy 0.10.1

 The 2.6 and 2.7 are I think the default system installs; I installed the
 3.2 versions manually from the source tarballs. All of those numpys are
 linked to the system Accelerate framework:

 {{{
 $ otool -L `python3.2 -c 'from numpy.core import _dotblas;
 print(_dotblas.__file__)'`
 /Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/site-
 packages/numpy/core/_dotblas.so:
     /System/Library/Frameworks/Accelerate.framework/Versions/A/Accelerate
 (compatibility version 1.0.0, current version 4.0.0)
     /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current
 version 125.2.1)
 }}}
 I get the same behavior on another Mac with a similar setup.

 But all of the options for `f` work on other machines running

  * OSX 10.6.8 with Python 2.6.1 and numpy 1.2.1 linked to Accelerate 4 and
 vecLib 268 (except that it doesn't have scipy or slogdet)
  * Debian 6 with Python 3.2.2, numpy 1.6.1, and scipy 0.10.1 linked to the
 system ATLAS
  * Ubuntu 11.04 with Python 2.7.1, numpy 1.5.1 and scipy 0.8.0 linked to
 system ATLAS

 So, I'm not 100% sure this counts as a numpy bug, rather than one in
 multiprocessing or Accelerate; if you think I should just report it to one
 of them, let me know.

 Any workarounds would also be appreciated, as this is currently making my
 life harder. :)

-- 
Ticket URL: <http://projects.scipy.org/numpy/ticket/2091>
NumPy <http://projects.scipy.org/numpy>
My example project


More information about the NumPy-Tickets mailing list