[Numpy-discussion] "import numpy" performance
Benjamin Root
ben.root@ou....
Mon Jul 2 15:43:09 CDT 2012
On Mon, Jul 2, 2012 at 4:34 PM, Nathaniel Smith <njs@pobox.com> wrote:
> On Mon, Jul 2, 2012 at 8:17 PM, Andrew Dalke <dalke@dalkescientific.com>
> wrote:
> > In this email I propose a few changes which I think are minor
> > and which don't really affect the external NumPy API but which
> > I think could improve the "import numpy" performance by at
> > least 40%. This affects me because I and my clients use a
> > chemistry toolkit which uses only NumPy arrays, and where
> > we run short programs often on the command-line.
> >
> >
> > In July of 2008 I started a thread about how "import numpy"
> > was noticeably slow for one of my customers. They had
> > chemical analysis software, often even run on a single
> > molecular structure using command-line tools, and the
> > several invocations with 0.1 seconds overhead was one of
> > the dominant costs even when numpy wasn't needed.
> >
> > I fixed most of their problems by deferring numpy imports
> > until needed. I remember well the Steve Jobs anecdote at
> >
> http://folklore.org/StoryView.py?project=Macintosh&story=Saving_Lives.txt
> > and spent another day of my time in 2008 to identify the
> > parts of the numpy import sequence which seemed excessive.
> > I managed to get the import time down from 0.21 seconds to
> > 0.08 seconds.
> >
> > Very little of that made it into NumPy.
> >
> >
> > The three biggest changes I would like are:
> >
> > 1) remove "add_newdocs" and put the docstrings in the C code
> > 'add_newdocs' still needs to be there,
> >
> > The code says:
> >
> > # This is only meant to add docs to objects defined in C-extension
> modules.
> > # The purpose is to allow easier editing of the docstrings without
> > # requiring a re-compile.
> >
> > However, the change log shows that there are relatively few commits
> > to this module
> >
> > Year Number of commits
> > ==== =================
> > 2012 8
> > 2011 62
> > 2010 9
> > 2009 18
> > 2008 17
> >
> > so I propose moving the docstrings to the C code, and perhaps
> > leaving 'add_newdocs' there, but only used when testing new
> > docstrings.
>
> I don't have any opinion on how acceptable this would be, but I also
> don't see a benchmark showing how much this would help?
>
> > 2) Don't optimistically assume that all submodules are
> > needed. For example, some current code uses
> >
> >>>> import numpy
> >>>> numpy.fft.ifft
> > <function ifft at 0x10199f578>
> >
> > (See a real-world example at
> >
> http://stackoverflow.com/questions/10222812/python-numpy-fft-and-inverse-fft
> > )
> >
> > IMO, this optimizes the needs of the interactive shell
> > NumPy author over the needs of the many-fold more people
> > who don't spend their time in the REPL and/or don't need
> > those extra features added to every NumPy startup. Please
> > bear in mind that NumPy users of the first category will
> > be active on the mailing list, go to SciPy conferences,
> > etc. while members of the second category are less visible.
> >
> > I recognize that this is backwards incompatible, and will
> > not change. However, I understand that "NumPy 2.0" is a
> > glimmer in the future, which might be a natural place for
> > a transition to the more standard Python style of
> >
> > from numpy import fft
> >
> > Personally, I think the documentation now (if it doesn't
> > already) should transition to use this form.
>
> I think this ship has sailed, but it'd be worth looking into lazy
> importing, where 'numpy.fft' isn't actually imported until someone
> starts using it. There are a bunch of libraries that do this, and one
> would have to fiddle to get compatibility with all the different
> python versions and make sure you're not killing performance (might
> have to be in C) but something along the lines of
>
> class _FFTModule(object):
> def __getattribute__(self, name):
> mod = importlib.import_module("numpy.fft")
> _FFTModule.__getattribute__ = mod.__getattribute__
> return getattr(mod, name)
> fft = _FFTModule()
>
>
Not sure how this would impact projects like ipython that does
tab-completion support, but I know that that would drive me nuts in my
basic tab-completion setup I have for my regular python terminal. Of
course, in the grand scheme of things, that really isn't all that
important, I don't think.
Ben Root
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20120702/2dd024ec/attachment-0001.html
More information about the NumPy-Discussion
mailing list