[Numpy-discussion] "import numpy" is slow
Fri Aug 1 15:33:02 CDT 2008
On Thu, Jul 31, 2008 at 10:02 PM, Robert Kern <email@example.com> wrote:
> On Thu, Jul 31, 2008 at 05:43, Andrew Dalke <firstname.lastname@example.org> wrote:
>> On Jul 31, 2008, at 12:03 PM, Robert Kern wrote:
>>> But you still can't remove them since they are being used inside
>>> numerictypes. That's why I labeled them "internal utility functions"
>>> instead of leaving them with minimal docstrings such that you would
>>> have to guess.
>> My proposal is to replace that code with a table mapping
>> the type name to the uppercase/lowercase/capitalized forms,
>> thus eliminating the (small) amount of time needed to
>> import string.
>> It makes adding new types slightly more difficult.
>> I know it's a tradeoff.
> Probably not a bad one. Write up the patch, and then we'll see how
> much it affects the import time.
> I would much rather that we discuss concrete changes like this rather
> than rehash the justifications of old decisions. Regardless of the
> merits about the old decisions (and I agreed with your position at the
> time), it's a pointless and irrelevant conversation. The decisions
> were made, and now we have a user base to whom we have promised not to
> break their code so egregiously again. The relevant conversation is
> what changes we can make now.
> Some general guidelines:
> 1) Everything exposed by "from numpy import *" still needs to work.
> a) The layout of everything under numpy.core is an implementation detail.
> b) _underscored functions and explicitly labeled internal functions
> can probably be modified.
> c) Ask about specific functions when in doubt.
> 2) The improvement in import times should be substantial. Feel free to
> bundle up the optimizations for consideration.
> 3) Moving imports from module-level down into the functions where they
> are used is generally okay if we get a reasonable win from it. The
> local imports should be commented, explaining that they are made local
> in order to improve the import times.
> 4) __import__ hacks are off the table.
> 5) Proxy objects ... I would really like to avoid proxy objects. They
> have caused fragility in the past.
> 6) I'm not a fan of having environment variables control the way numpy
> gets imported, but I'm willing to consider it. For example, I might go
> for having proxy objects for linalg et al. *only* if a particular
> environment variable were set. But there had better be a very large
> improvement in import times.
I just want to say that I agree with Andrew that slow imports just
suck. But it's not really that bad, for example on my system:
In : %time import numpy
CPU times: user 0.11 s, sys: 0.01 s, total: 0.12 s
Wall time: 0.12 s
so that's ok. For comparison:
In : %time import sympy
CPU times: user 0.12 s, sys: 0.02 s, total: 0.14 s
Wall time: 0.14 s
But I am still unhappy about it, I'd like if the package could import
much faster, because it adds up, when you need to import 7 packages
like that, it's suddenly 1s and that's just too much.
But of course everything within the constrains that Robert has
outlined. From the theoretical point of view, I don't understand why
python cannot just import numpy (or any other package) immediatelly,
and only at the moment the user actually access something, to import
it in real. Mercurial uses a lazy import module, that does exactly
this. Maybe that's an option?
Look into mercurial/demandimport.py.
Use it like this:
In : import demandimport
In : demandimport.enable()
In : %time import numpy
CPU times: user 0.00 s, sys: 0.00 s, total: 0.00 s
Wall time: 0.00 s
That's pretty good, huh? :)
Unfortunately, numpy cannot work with lazy import (yet):
In : %time from numpy import array
ERROR: An unexpected error occurred while tokenizing input
The following traceback may be corrupted or invalid
The error message is: ('EOF in multi-line statement', (17, 0))
AttributeError Traceback (most recent call last)
/usr/lib/python2.5/site-packages/numpy/lib/index_tricks.py in <module>()
14 import function_base
15 import numpy.core.defmatrix as matrix
---> 16 makemat = matrix.matrix
18 # contributed by Stefan van der Walt
/home/ondra/ext/sympy/demandimport.pyc in __getattribute__(self, attr)
73 return object.__getattribute__(self, attr)
---> 75 return getattr(self._module, attr)
76 def __setattr__(self, attr, val):
AttributeError: 'module' object has no attribute 'matrix'
BTW, neither can SymPy. However, maybe it shows some possibilities and
maybe it's possible to fix numpy to work with such a lazy import.
On the other hand, I can imagine it can bring a lot more troubles, so
it should probably only be optional.
More information about the Numpy-discussion