[Numpy-discussion] "import numpy" performance

Andrew Dalke dalke@dalkescientific....
Mon Jul 2 17:54:46 CDT 2012

On Jul 3, 2012, at 12:21 AM, Nathaniel Smith wrote:
> Yes, but for a proper benchmark we need to compare this to the number
> that we would get with some other implementation... I'm assuming you
> aren't proposing we just delete the docstrings :-).

I suspect that we have a different meaning of the term 'benchmark'.

A benchmark establishes first the baseline by which future
implementations are measured. Which is why I did.

Once there are changes, the benchmark, rerun, helps judge
the usefulness of those changes. This I did not do.

I do not believe that a benchmark requires the changed code as
well before it can be considered a "proper benchmark"

>> This says that 'add_newdocs', which is imported from
>> numpy.core.multiarray (though there may be other importers)
>> takes 0.038 seconds to go through __import__, including
>> all of its children module imports.
> There are no "children modules", all these modules refer to each
> other, and you're assuming that whichever module you happen to load
> first is responsible for all the other modules it happens to
> reference.

While I believe there is an "import tree" analogous to
a "call tree" and Python's import scheme helps ensure
that it's a DAG (so that 'children modules' has a real
meaning), you are correct in identifying that I was only
pointing out the first parent, and not all of the parents.

add_newdocs is the first module to import 'numpy.lib', but
after further testing (I stubbed out the import and made
a fake function), I see that other modules import numpy.lib
and there's no measurable performance increase.

I retract therefore my proposal to move the documentation
which is currently in add_newdocs into the C code.

>>>> With instrumentation I found that 0.083s of the 0.119s
>>>> is spent loading numpy.core.multiarray.
> The number 0.083 doesn't appear anywhere in that profile you pasted,
> so I don't know where this comes from...

I did not save the output run which I used for my original email.
It's easy to generate, so I just ran it again.



More information about the NumPy-Discussion mailing list