[Numpy-discussion] import overhead of numpy.testing

Ralf Gommers ralf.gommers@gmail....
Sat Aug 10 11:50:16 CDT 2013


On Sat, Aug 10, 2013 at 5:21 PM, Andrew Dalke <dalke@dalkescientific.com>wrote:

> [Short version: It doesn't look like my proposal or any
> simple alternative is tenable.]
>
> On Aug 10, 2013, at 10:28 AM, Ralf Gommers wrote:
> > It does break backwards compatibility though, because now you can do:
> >
> >    import numpy as np
> >    np.testing.assert_equal(x, y)
>
> Yes, it does.
>
> I realize that a design goal in numpy was that (most?) submodules are
> available without any additional imports. This is the main reason for
> the "import numpy" overhead. The tension between ease-of-use for some
> and overhead for others is well known. For example, Sage tickets 3577,
> 6494, and 11714 relate to deferring numpy import during startup.
>
>
> The three relevant questions are:
>
> 1) is numpy.testing part of that promise? This can be split
>    into multiple ways.
>
>     o The design goal could be that only the numerics that people use
>        for interactive/REPL computing are accessible without
>        additional explicit imports, which implies that the import of
>        numpy.testing is an implementation side-effect of providing
>        submodule-level "test()" and "bench()" APIs
>
>     o all NumPy modules with user-facing APIs should be accessible
>       from numpy without additional imports
>
> While I would like to believe that the import of numpy.testing
> is an implementation side-effect of providing test() and bench(),
> I believe that I'm unlikely to convince the majority.
>

It likely is a side-effect rather than intentional design, but at this
point that doesn't matter much anymore. There never was a clear distinction
between private and public modules and now, as your investigation shows,
the cost of removing the import is quite high.

For justifiable reasons, the numpy project is loath to break
> backwards compatibility, and I don't think there's an existing
> bright-line policy which would say that "import numpy; numpy.testing"
> should be avoided.
>
>
> 2) If it isn't a promise that "numpy.testing" is usable after an
>    "import numpy" then how many people will be affected by an
>     implementation change, and at what level of severity?
>
>
>
> I looked to see which packages might fail. A Debian code
> search of "numpy.testing" showed no problems, and no one
> uses "np.testing".
>
> I did a code search at http://code.ohloh.net . Of the first
> 200 or so hits for "numpy.testing", nearly all of them fell
> into uses like:
>
> from numpy.testing import Tester
> from numpy.testing import assert_equal, TestCase
> from numpy.testing.utils import *
> from numpy.testing import *
>
> There were, however, several packages which would fail:
>
>  test_io.py and test_image.py and test_array_bridge.py in MediPy
>     (Interestingly, test_circle.py has a "import numpy.testing",
>      so it's not universal practice in that package)
>  calculators_test.py in  OpenQuake Engine
>  ForcePlatformsExtractorTest.py in b-tk
>
> Note that these failures are in the test programs, and not
> in the main body code, so are unlikely to break end-user
> programs.
>
>
> HOWEVER!
>
> The real test is for people who do "import numpy as np" then
> refer to "np.testing". There are "about 454" such matches in
> Ohloh.
>
> One example is 'test_polygon.py' from scikit-image. Others are:
>  test_anova.py in statsmodel
>  test_graph.py in scikit-learn
>  test_rmagic.py in IPython
>  test_mlab.py in matplotlib
>
> Nearly all the cases I looked at were in files starting "test",
> or a handful which ended in "test.py" or "Test.py". Others use
> np.test only as part of a unit test, such as:
>
>  affine_grid.py and others in pyMor (as part of in-file unit tests)
>  technical_indicators.py in QuantPy (as part of unittest.TestCase)
>  coord_tools.py in NiPy-OLD (as part of in-file unit tests)
>  predstd.py and others in statsmodels (as a main-line unit test)
>  galsim_test_helpers.py in GalSim
>
> These would likely not break end-user code.
>
> Sadly, not all are that safe. For examples:
>  simple_contrast.py  example program for nippy
>  try_signal_lti.py in joePython
>  run.py in python-seminar
>  verify.py in bell_d_project (a final project for a CS class)
>  ex_shrink_pickle.py in statsmodels (as an example?)
>  parametric_design.py in nippy (uses assert_almost_equal to verify an
> example)
>  model.py in pymc-devs's pymc
>  model.py in duffy
>  zipline in olmar
>  utils.py in MNE
> .... and I gave up at result 320 of 454.
>
> Based on this, about 1% of the programs which use numpy.testing
> would break. This tells me that there are enough user programs
> which would fail that I don't think numpy will decide to make
> this change.
>
>
>
>
> And the third question is
>
>   3) Are there other alternatives?
>
> Or as Ralf Gommers wrote:
> > Do you have more detailed timings? I'm guessing the bottleneck is
> importing nose.
>
>
> I do have more detailed timings. "nose" is not imported
> during an "import numpy". (For one, "import nose" takes
> a full 0.11 seconds on my laptop and adds 199 modules
> to sys.modules!)
>
>
> The hit is the "import unittest" in numpy.testing, which
> exists only to place "TestCase" in the numpy.testing namespace.
> "numpy.testing.TestCase" is only used by the unit tests,
> and not by any direct end-user code.
>
> Here's the full hierarchical timing breakdown showing
>  - module name
>  - cumulative time to load
>  - parent module
>
>     testing: 0.0065 (numpy.core.numeric)
>      unittest: 0.0055 (testing)
>       result: 0.0011 (unittest)
>        traceback: 0.0004 (result)
>         linecache: 0.0000 (traceback)
>        StringIO: 0.0004 (result)
>         errno: 0.0000 (StringIO)
>       case: 0.0021 (unittest)
>        difflib: 0.0011 (case)
>        pprint: 0.0004 (case)
>        util: 0.0000 (case)
>       suite: 0.0002 (unittest)
>       loader: 0.0006 (unittest)
>        fnmatch: 0.0002 (loader)
>       main: 0.0010 (unittest)
>        time: 0.0000 (main)
>        signals: 0.0006 (main)
>         signal: 0.0000 (signals)
>         weakref: 0.0005 (signals)
>          UserDict: 0.0000 (weakref)
>          _weakref: 0.0000 (weakref)
>          _weakrefset: 0.0000 (weakref)
>          exceptions: 0.0000 (weakref)
>       runner: 0.0000 (unittest)
>      utils: 0.0005 (testing)
>       nosetester: 0.0002 (utils)
>        numpy.testing.utils: 0.0000 (nosetester)
>      numpytest: 0.0001 (testing)
>
> As you can see, "unittest" imports a large number of modules.
>
> I see no good way to get rid of this unittest import.
>

Indeed. I had a quick look at the benefit of copying TestCase into
numpy.testing so the import of unittest can be removed, but more than half
the time is spent inside case.py and result.py, which would still be needed.


> Even if all of the tests were rewritten to use unittest.TestCase,
> numpy.testing.TestCase would still need to be present so
> third-party packages could derive from it, and there's no (easy?)
> way to make that TestCase some sort of deferred object which
> gets the correct TestCase when needed.
>
>
> In conclusion, it looks like my proposal is not tenable and
> there's no easy way to squeeze out that ~5% of startup overhead.
>

It does look that way.

Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20130810/f9db7d12/attachment-0001.html 


More information about the NumPy-Discussion mailing list