[Numpy-discussion] parallel compilation of numpy

Michael Abshoff michael.abshoff@googlemail....
Thu Feb 19 00:05:45 CST 2009


David Cournapeau wrote:
> Michael Abshoff wrote:
>> David Cournapeau wrote:

Hi David,

>> With Sage we do the cythonization in parallel and for now build 
>> extension serially, but we have code to do that in parallel, too. Given 
>> that we are building 180 extensions or so the speedup is linear. I often 
>> do this using 24 cores, so it seems robust since I do work on Sage daily 
>> and often to test builds from scratch and I never had any problems with 
>> that code.
>>   
> 
> Note that building from scratch is the easy case, specially in the case
> of parallel builds.

Sure, it also works for incremental builds and I do that many, many 
times a day, i.e. for each patch I merge into the Sage library. What 
gets recompiled is decided by our own dependency tracking code which we 
want to push into Cython itself. Figuring out dependencies on the fly 
without caching takes about 1s for the whole Sage library which includes 
parsing every Cython file.

Note that we build each extension in parallel, so if you depend on a lot 
of fortran or c code to be linked into one extension this obviously 
doesn't help much. The situation with Sage's extension is that we build 
external libraries ahead of time and 99% of extensions do not have 
additional C/C++ files and those who do have usually one to three extra 
files, so for our purposes this scales linear.

> Also, I would guess "cythonizing" is easy, at least
> if it is done entirely in python. Races conditions in subprocess are a
> real problem, it caused numerous issues in scons and waf, so I would be
> really surprised if it did not caused any trouble in distutils.
> Particularly, on windows, subprocess up to python 2.4 was problematic, I
> believe (I should really check, because I was not involved in the
> related discussions nor with the fixes in scons).

We used to use threads for the "parallel stuff" and it is indeed racy, 
but that was mostly observed when running doctests since we only had one 
current directory. All those problems went away once we started to use 
Pyprocessing and while there is some overhead for the forks it is 
drowned out by the build time when using 2 cors.

>> To taunt Ondrej: A one minute build isn't forever - numpy is tiny and I 
>> understand why it might seem long compared to SymPy, but just wait until 
>> you add Cython extensions per default and those build times will go up 
>> substantially
>>   
> 
> Building scipy installer on windows takes 1 hour, which is already
> relatively significant. 

Ouch. Is that without the dependencies, i.e. ATLAS?

I was curious how you build the various version of ATLAS, i.e. no SSE, 
SSE, SSE2, etc. Do you just set the arch via -A and build them all on 
the same box? [sorry for getting slightly OT here :)]

> But really, parallel builds is just a nice
> consequence of using a sane build tool. I simply cannot stand distutils
> anymore; it now feels even more painful than developing on windows.
> Every time you touch something, something else, totally unrelated breaks.

Yeah, distutils are a pain, but numpy extending/modifying them doesn't 
make it any cleaner :(. I am looking forward to the day NumScons is part 
of Numpy though.

> cheers,
> 
> David

Cheers,

Michael

> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion
> 



More information about the Numpy-discussion mailing list