[SciPy-dev] A patch for weave.catalog

Fernando Perez fperez.net at gmail.com
Thu Jan 18 13:41:30 CST 2007

Hi all,

does anyone object to the patch at the end?

Rationale: when running using scipy in parallel on a cluster whose members
share an NFS filesystem, the current code can blow up because the test file is
hardcoded to dir+'/dummy', and all the processes race to manipulate the same
file.  My patch tries to fix this using portable calls (as far as the docs
say, both socket.gethostname() and os.getpid() are fully portable).

If nobody objects in 24 hours or I get an explicit OK before that from a core
dev, I'll be happy to commit this.  I've tested it and it solves my recurrent
deadlocks on my system.

I suspect scipy hadn't seen too much parallel use before, so I'm sure we'll
begin finding other similar little gremlins as time goes by.  I'll be happy to
fix the ones I can.




Index: Lib/weave/catalog.py
--- Lib/weave/catalog.py	(revision 2579)
+++ Lib/weave/catalog.py	(working copy)
@@ -33,6 +33,7 @@

   import os,sys,string
   import pickle
+import socket
   import tempfile

@@ -127,13 +128,30 @@

   def is_writable(dir):
-    dummy = os.path.join(dir, "dummy")
+    """Determine whether a given directory is writable in a portable manner.
+    :Parameters:
+     - dir: string
+       A string represeting a path to a directory on the filesystem.
+    :Returns:
+      True or False.
+    """
+    # Do NOT use a hardcoded name here due to the danger from race conditions
+    # on NFS when multiple processes are accessing the same base directory in
+    # parallel.  We use both hostname and pocess id for the prefix in an
+    # attempt to ensure that there can really be no name collisions (tempfile
+    # appends 6 random chars to this prefix).
+    prefix = 'dummy_%s_%s_' % (socket.gethostname(),os.getpid())
-        open(dummy, 'w')
-    except IOError:
-        return 0
-    os.unlink(dummy)
-    return 1
+        tmp = tempfile.TemporaryFile(prefix=prefix,dir=dir)
+    except OSError:
+        return False
+    # The underlying file is destroyed upon closing the file object (under
+    # *nix, it was unlinked at creation time)
+    tmp.close()
+    return True

   def whoami():
       """return a string identifying the user."""

More information about the Scipy-dev mailing list