[Numpy-discussion] "import numpy" is slow

Andrew Dalke dalke@dalkescientific....
Sat Aug 2 04:19:13 CDT 2008


I've got a proof of concept that take the time on my machine to  
"import numpy" from 0.21 seconds down to 0.08 seconds.  Doing that  
required some somewhat awkward things, like deferring all 'import re'  
statements.  I don't think that's stable in the long run because  
people will blithely import re in the future and not care that it  
takes 0.02 seconds to import.  I don't blame them for complaining; I  
was curious on how fast I could get things.

Note that when I started complaining about this a month ago the  
import time on my machine was about 0.3 seconds.

I'll work on patches within the next couple of days.  Here's an  
outline of what I did, along with some questions about what's feasible.

1) don't import 'numpy.testing'.  Savings = 0.012s.
Doing so required patches like

-from numpy.testing import Tester
-test = Tester().test
-bench = Tester().bench
+def test(label='fast', verbose=1, extra_argv=None, doctests=False,
+         coverage=False, **kwargs):
+    from testing import Tester
+    import numpy
+    Tester(numpy).test(label, verbose, extra_argv, doctests,
+                       coverage, **kwargs)
+def bench(label='fast', verbose=1, extra_argv=None):
+    from testing import Tester
+    import numpy
+    Tester(numpy).bench(label, verbose, extra_argv)

QUESTION: since numpy is moving to nose, and the documentation only  
describes doing 'import numpy; numpy.test()', can I remove all other  
definitions of "test" and "bench"?


2)  removing 'import ctypeslib' in top-level -> 0.023 seconds

QUESTION: is this considered part of the API that must be preserved?   
The primary use case is supposed to be to help interactive users.  I  
don't think interactive users spend much time using ctypes, and those  
that do are also those that aren't confused about needing an extra  
import statement.

3) removing 'import string' in numerictypes.py -> 0.008 seconds .   
This requires some ugly but simple changes to the code.

4) remove the 'import re' in _internal, numpy/lib/, function_base,  
and other places.  This reduced my overall startup cost by 0.013.

5) defer bzip and gzip imports in _datasource: 0.009 s.  This will  
require non-trivial code changes.

6) defer 'format' from io.py: 0.007 s

7) _datasource imports shutil in order to use shutil.rmdir in a  
__del__.  I don't think this can be deferred, because I don't want to  
do an import during system shutdown, which is when the __del__ might  
be called.  It would save 0.004s.

8) If I can remove 'import doc' from the top-level numpy (is that  
part of the required API?) then I can save 0.004s.

9) defer urlparse in _datasource: about 0.003s

10) If I get rid of the cPickle top-level numeric.py then I can save  
0.006 seconds.

11) not importing add_newdocs saves 0.005 s.  This might be possible  
by moving all of the docstrings to the actual functions.  I haven't  
looked into this much and it might not be possible.

Those millisecond improvements add up!  When I do an interactive  
'import numpy' on my system I don't notice the import time like I did  
before.

				Andrew
				dalke@dalkescientific.com




				Andrew
				dalke@dalkescientific.com




More information about the Numpy-discussion mailing list