[Numpy-discussion] Linker script, smaller source files and symbol visibility

Charles R Harris charlesr.harris@gmail....
Mon Apr 20 10:48:37 CDT 2009


Hi David

On Mon, Apr 20, 2009 at 6:51 AM, David Cournapeau <
david@ar.media.kyoto-u.ac.jp> wrote:

> Hi,
>
>    For quite a long time I have been bothered by the very large files
> needed for python extensions. In particular for numpy.core, which
> consists in a few files which are ~ 1 Mb, I find this a pretty high
> barrier of entry for newcomers, and it has quite a big impact on the
> code organization. I think I have found a way to split things on common
> platforms (this includes at least windows, mac os x, linux and solaris),
> without impacting other  potentially less capable platforms, or static
> linking of numpy.
>

There was a discussion of this a couple of years ago. I was in favor of many
small files maybe in subdirectories. Robert, IIRC, thought too many small
files could become confusing, so there is a fine line in there somewhere.  I
am generally in favor of breaking the files up into their functional
components and maybe rewriting some of the upper level interface files in
cython. But it does need some agreement and we should probably start by just
breaking up a few files. I don't have a problem with big files that are just
collections of small routines all of the same type, umath_loops.inc.src for
instance.


>
> Assuming my idea is technically sound and that I can demonstrate it
> works on say Linux without impacting other platforms (see example
> below), would that be considered useful ?
>

Definitly worth consideration.


>
> cheers,
>
> David
>
> Technical details
> ==================
>
>    The rationale for doing things as they are is a C limitation related
> to symbol visibility being limited to file scope, i.e. if you want to
> share a function into several files without making it public in the
> binary, you have to tag the function static, and include all .c files
> which use this function into one giant .c file. That's how we do it in
> numpy. Many binary format (elf, coff and Mach-O) have a mechanism to
> limit the symbol visibility, so that we can explicitly set the functions
> we do want to export. With a couple of defines, we could either include
> every files and tag the implementation functions as static, or link
> every file together and limit symbol visibility with some linker magic.
>

Maybe just not worry about symbol visibility on other platforms. It is one
of those warts that only becomes apparent when you go looking for it. For
instance, the current *.so has some extraneous symbols but I don't hear
folks complaining.


>
> Example
> -------
>
> I use the spam example from the official python doc, with one function
> PySpam_System which is exported in a C API, and the actual
> implementation is _pyspam_system.
>
> * spammodule.c: define the interface available from python interpreter:
>
> #include
> <Python.h>
>
> #include
> <stdio.h>
>
>
>
> #define
> SPAM_MODULE
>
> #include
> "spammodule.h"
>
> #include
> "spammodule_imp.h"
>
>
> /* if we don't know how to deal with symbol visibility on the platform,
> just include everything in one file */
> #ifdef
> SYMBOL_SCRIPT_UNSUPPORTED
>
> #include
> "spammodule_imp.c"
>
> #endif
>
>
> /* C API for spam module */
>
>
> static int
> PySpam_System(const char *command)
> {
>    _pyspam_implementation(command);
>    return 0;
> }
>
> * spammodule_imp.h: declares the implementation, should only be included
> by spammodule.c and spammodule_imp.c which implements the actual function
>
> #ifndef _IMP_H_
> #define _IMP_H_
>
> #ifndef SPAM_MODULE
> #error this should not be included unless you really know what you are
> doing
> #endif
>
> #ifdef SYMBOL_SCRIPT_UNSUPPORTED
> #define SPAM_PRIVATE static
> #else
> #define SPAM_PRIVATE
> #endif
>
> SPAM_PRIVATE int
> _pyspam_implementation(const char *command);
>
> #endif
>
> For supported platforms (where SYMBOL_SCRIPT_UNSUPPORTED is not
> defined), _pyspam_implementation would not be visible because we would
> have a list of functions to export (only initspam in this case).
>
> Advantages
> ----------
>
> This has several advantages on platforms where this is supported
>    - code more amenable: source code which are thousand of lines are
> difficult to follow
>    - faster compilation times: in my experience, compilation time
> doesn't scale linearly with the amount of code.
>    - compilation can be better parallelized
>    - changing one file does not force a whole multiarray/ufunc module
> recompilation (which can be pretty long when you chase bugs in it)
>
> Another advantage is related to namespace pollution. Since library
> extensions are static libraries for now, any symbol frome those
> libraries used by any extension is publicly available. For example, now
> that multiarray.so uses the npy_math library, every symbol in npy_math
> is in the public namespace. That's also true for every scipy extensions
> (for example, _fftpack.so exports the whole dfftpack public API). If we
> want to go further down the road of making core computational code
> publicly available, I think we should improve this first.
>
> Disadvantage
> ------------
>
> We need to code it. There are two parts:
>    - numpy.distutils support: I have already something working in for
> linux. Once we have one platform working, adding others should not be a
> problem
>    - changing the C code: we could at first splitting things in .c
> files but still including everything, and then starting the conversion.
>

There's the rub.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20090420/cefbf8e6/attachment-0001.html 


More information about the Numpy-discussion mailing list