[Numpy-discussion] "import numpy" is slow

Ondrej Certik ondrej@certik...
Fri Aug 1 15:33:02 CDT 2008


On Thu, Jul 31, 2008 at 10:02 PM, Robert Kern <robert.kern@gmail.com> wrote:
> On Thu, Jul 31, 2008 at 05:43, Andrew Dalke <dalke@dalkescientific.com> wrote:
>> On Jul 31, 2008, at 12:03 PM, Robert Kern wrote:
>
>>> But you still can't remove them since they are being used inside
>>> numerictypes. That's why I labeled them "internal utility functions"
>>> instead of leaving them with minimal docstrings such that you would
>>> have to guess.
>>
>> My proposal is to replace that code with a table mapping
>> the type name to the uppercase/lowercase/capitalized forms,
>> thus eliminating the (small) amount of time needed to
>> import string.
>>
>> It makes adding new types slightly more difficult.
>>
>> I know it's a tradeoff.
>
> Probably not a bad one. Write up the patch, and then we'll see how
> much it affects the import time.
>
> I would much rather that we discuss concrete changes like this rather
> than rehash the justifications of old decisions. Regardless of the
> merits about the old decisions (and I agreed with your position at the
> time), it's a pointless and irrelevant conversation. The decisions
> were made, and now we have a user base to whom we have promised not to
> break their code so egregiously again. The relevant conversation is
> what changes we can make now.
>
> Some general guidelines:
>
> 1) Everything exposed by "from numpy import *" still needs to work.
>  a) The layout of everything under numpy.core is an implementation detail.
>  b) _underscored functions and explicitly labeled internal functions
> can probably be modified.
>  c) Ask about specific functions when in doubt.
>
> 2) The improvement in import times should be substantial. Feel free to
> bundle up the optimizations for consideration.
>
> 3) Moving imports from module-level down into the functions where they
> are used is generally okay if we get a reasonable win from it. The
> local imports should be commented, explaining that they are made local
> in order to improve the import times.
>
> 4) __import__ hacks are off the table.
>
> 5) Proxy objects ... I would really like to avoid proxy objects. They
> have caused fragility in the past.
>
> 6) I'm not a fan of having environment variables control the way numpy
> gets imported, but I'm willing to consider it. For example, I might go
> for having proxy objects for linalg et al. *only* if a particular
> environment variable were set. But there had better be a very large
> improvement in import times.


I just want to say that I agree with Andrew that slow imports just
suck. But it's not really that bad, for example on my system:

In [1]: %time import numpy
CPU times: user 0.11 s, sys: 0.01 s, total: 0.12 s
Wall time: 0.12 s

so that's ok. For comparison:

In [1]: %time import sympy
CPU times: user 0.12 s, sys: 0.02 s, total: 0.14 s
Wall time: 0.14 s

But I am still unhappy about it, I'd like if the package could import
much faster, because it adds up, when you need to import 7 packages
like that, it's suddenly 1s and that's just too much.

But of course everything within the constrains that Robert has
outlined. From the theoretical point of view, I don't understand why
python cannot just import numpy (or any other package) immediatelly,
and only at the moment the user actually access something, to import
it in real. Mercurial uses a lazy import module, that does exactly
this. Maybe that's an option?

Look into mercurial/demandimport.py.

Use it like this:

In [1]: import demandimport

In [2]: demandimport.enable()

In [3]: %time import numpy
CPU times: user 0.00 s, sys: 0.00 s, total: 0.00 s
Wall time: 0.00 s


That's pretty good, huh? :)

Unfortunately, numpy cannot work with lazy import (yet):

In [5]: %time from numpy import array
ERROR: An unexpected error occurred while tokenizing input
The following traceback may be corrupted or invalid
The error message is: ('EOF in multi-line statement', (17, 0))

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)

[skip]


/usr/lib/python2.5/site-packages/numpy/lib/index_tricks.py in <module>()
     14 import function_base
     15 import numpy.core.defmatrix as matrix
---> 16 makemat = matrix.matrix
     17
     18 # contributed by Stefan van der Walt

/home/ondra/ext/sympy/demandimport.pyc in __getattribute__(self, attr)
     73             return object.__getattribute__(self, attr)
     74         self._load()
---> 75         return getattr(self._module, attr)
     76     def __setattr__(self, attr, val):
     77         self._load()

AttributeError: 'module' object has no attribute 'matrix'




BTW, neither can SymPy. However, maybe it shows some possibilities and
maybe it's possible to fix numpy to work with such a lazy import.

On the other hand, I can imagine it can bring a lot more troubles, so
it should probably only be optional.


Ondrej


More information about the Numpy-discussion mailing list