[Numpy-discussion] parallel compilation of numpy
Thu Feb 19 00:05:45 CST 2009
David Cournapeau wrote:
> Michael Abshoff wrote:
>> David Cournapeau wrote:
>> With Sage we do the cythonization in parallel and for now build
>> extension serially, but we have code to do that in parallel, too. Given
>> that we are building 180 extensions or so the speedup is linear. I often
>> do this using 24 cores, so it seems robust since I do work on Sage daily
>> and often to test builds from scratch and I never had any problems with
>> that code.
> Note that building from scratch is the easy case, specially in the case
> of parallel builds.
Sure, it also works for incremental builds and I do that many, many
times a day, i.e. for each patch I merge into the Sage library. What
gets recompiled is decided by our own dependency tracking code which we
want to push into Cython itself. Figuring out dependencies on the fly
without caching takes about 1s for the whole Sage library which includes
parsing every Cython file.
Note that we build each extension in parallel, so if you depend on a lot
of fortran or c code to be linked into one extension this obviously
doesn't help much. The situation with Sage's extension is that we build
external libraries ahead of time and 99% of extensions do not have
additional C/C++ files and those who do have usually one to three extra
files, so for our purposes this scales linear.
> Also, I would guess "cythonizing" is easy, at least
> if it is done entirely in python. Races conditions in subprocess are a
> real problem, it caused numerous issues in scons and waf, so I would be
> really surprised if it did not caused any trouble in distutils.
> Particularly, on windows, subprocess up to python 2.4 was problematic, I
> believe (I should really check, because I was not involved in the
> related discussions nor with the fixes in scons).
We used to use threads for the "parallel stuff" and it is indeed racy,
but that was mostly observed when running doctests since we only had one
current directory. All those problems went away once we started to use
Pyprocessing and while there is some overhead for the forks it is
drowned out by the build time when using 2 cors.
>> To taunt Ondrej: A one minute build isn't forever - numpy is tiny and I
>> understand why it might seem long compared to SymPy, but just wait until
>> you add Cython extensions per default and those build times will go up
> Building scipy installer on windows takes 1 hour, which is already
> relatively significant.
Ouch. Is that without the dependencies, i.e. ATLAS?
I was curious how you build the various version of ATLAS, i.e. no SSE,
SSE, SSE2, etc. Do you just set the arch via -A and build them all on
the same box? [sorry for getting slightly OT here :)]
> But really, parallel builds is just a nice
> consequence of using a sane build tool. I simply cannot stand distutils
> anymore; it now feels even more painful than developing on windows.
> Every time you touch something, something else, totally unrelated breaks.
Yeah, distutils are a pain, but numpy extending/modifying them doesn't
make it any cleaner :(. I am looking forward to the day NumScons is part
of Numpy though.
> Numpy-discussion mailing list
More information about the Numpy-discussion