[Numpy-discussion] Proposed Roadmap Overview

Mark Wiebe mwwiebe@gmail....
Sun Feb 19 13:13:20 CST 2012

On Sun, Feb 19, 2012 at 5:25 AM, Nathaniel Smith <njs@pobox.com> wrote:

> On Sun, Feb 19, 2012 at 9:16 AM, David Cournapeau <cournape@gmail.com>
> wrote:
> > On Sun, Feb 19, 2012 at 8:08 AM, Mark Wiebe <mwwiebe@gmail.com> wrote:
> >> Is there a specific
> >> target platform/compiler combination you're thinking of where we can do
> >> tests on this? I don't believe the compile times are as bad as many
> people
> >> suspect, can you give some simple examples of things we might do in
> NumPy
> >> you expect to compile slower in C++ vs C?
> >
> > Switching from gcc to g++ on the same codebase should not change much
> > compilation times. We should test, but that's not what worries me.
> > What worries me is when we start using C++ specific code, STL and co.
> > Today, scipy.sparse.sparsetools takes half of the build time  of the
> > whole scipy, and it does not even use fancy features. It also takes Gb
> > of ram when building in parallel.
> I like C++ but it definitely does have issues with compilation times.
> IIRC the main problem is very simple: STL and friends (e.g. Boost) are
> huge libraries, and because they use templates, the entire source code
> is in the header files. That means that as soon as you #include a few
> standard C++ headers, your innocent little source file has suddenly
> become hundreds of thousands of lines long, and it just takes the
> compiler a while to churn through megabytes of source code, no matter
> what it is. (Effectively you recompile some significant fraction of
> STL from scratch on every file, and then throw it away.)
> Precompiled headers can help some, but require complex and highly
> non-portable build-system support. (E.g., gcc's precompiled header
> constraints are here:
> http://gcc.gnu.org/onlinedocs/gcc/Precompiled-Headers.html -- only one
> per source file, etc.)

This doesn't look too bad, I think it would be worth setting these up in
NumPy. The complexity you see is because its pretty close to the only way
that precompiled headers could be set up.

> To demonstrate: a trivial hello-world in C using <stdio.h>, versus a
> trivial version in C++ using <iostream>.
> On my laptop (gcc 4.5.2), compiling each program 100 times in a loop
> requires:
>  C: 2.28 CPU seconds
>  C compiled with C++ compiler: 4.61 CPU seconds
>  C++: 17.66 CPU seconds
> Slowdown for using g++ instead of gcc: 2.0x
> Slowdown for using C++ standard library: 3.8x
> Total C++ penalty: 7.8x
> Lines of code compiled in each case:
>  $ gcc -E hello.c | wc
>      855    2039   16934
>  $ g++ -E hello.cc | wc
>    18569   40994  437954
> (I.e., the C++ hello world is almost half a megabyte.)
> Of course we won't be using <iostream>, but <vector>, <unordered_map>
> etc. all have the same basic character.

Thanks for doing the benchmark. It is a bit artificial, however, and when I
tried these trivial examples with -O0 and -O2, the difference (in gcc 4.7)
of the C++ compile time was about 4%. In NumPy presently as it is in C, the
difference between -O0 and -O2 is very significant, and any comparisons
need to take this kind of thing into account. When I said I thought the
compile-time differences would be smaller than many people expect, I was
thinking about how this optimization phase, which is shared between C and
C++, often dominating the compile times.


> -- Nathaniel
> (Test files attached, times were from:
>  time sh -c 'for i in $(seq 100); do gcc hello.c -o hello-c; done'
>  cp hello.c c-hello.cc
>  time sh -c 'for i in $(seq 100); do g++ c-hello.cc -o c-hello-cc; done'
>  time sh -c 'for i in $(seq 100); do g++ hello.cc -o hello-cc; done'
> and then summing the resulting user and system times.)
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20120219/92060a77/attachment.html 

More information about the NumPy-Discussion mailing list