[Numpy-discussion] profiling line by line
Robert Kern
robert.kern@gmail....
Thu Sep 18 14:56:18 CDT 2008
On Thu, Sep 18, 2008 at 05:46, Ondrej Certik <ondrej@certik.cz> wrote:
> On Thu, Sep 18, 2008 at 1:29 AM, Robert Kern <robert.kern@gmail.com> wrote:
>> On Wed, Sep 17, 2008 at 18:09, Ondrej Certik <ondrej@certik.cz> wrote:
>>> Anyway, so I used it on my code and here is what I got:
>>>
>>> File: hermes1d/mesh.py
>>> Function: integrate_function at line 119
>>> Total time: 0.647412 s
>>>
>>> Line # Hits Time % Time Line Contents
>>> =====================================================
>>> 119 @profile
>>> 120 def integrate_function(self, f):
>>> 121 """
>>> 122 Integrate the function
>>> "f" on the element.
>>> 123 """
>>> 124 96 1091 0.2 from numpy import array
>>> 125 96 461070 71.2 from scipy.integrate
>>> import quadrature
>>> 126 96 496 0.1 a, b =
>>> self.nodes[0].x, self.nodes[1].x
>>> 127 96 418 0.1 def func(x):
>>> 128 #print x
>>> 129 #print
>>> array([f.f(y) for y in x])
>>> 130 return
>>> array([f.f(y) for y in x])
>>> 131 96 183903 28.4 val, err =
>>> quadrature(func, a, b)
>>> 132 #print val, a, b
>>> 133 #stop
>>> 134 96 434 0.1 return val
>>>
>>>
>>>
>>> This is interesting, because 70% of time is spent on "from
>>> scipy.integrate import quadrature". So this is not good.
>>>
>>> So I moved it out and voila! My code is 3.4x faster. I really didn't
>>> know that importing (scipy.integrate) is *so* slow.
>>
>> It should only be slow the first time. If it's not, then there's a
>> problem somewhere (and I wouldn't put it past being in the profiler;
>> it might screw up something in the import mechanism). Can you do the
>> import both outside the function *and* inside to see if that changes
>> the result?
>
> Sure. "b.py":
>
> from scipy.integrate import quadrature
>
> ---------------
> def a():
> b = 1
>
>
> for x in range(1000000):
> a()
> ---------------
>
> "a.py":
>
> ---------------
> def a():
> from scipy.integrate import quadrature
> b = 1
>
>
> for x in range(1000000):
> a()
> ---------------
>
> and:
>
> ondra@pc232:~/repos/hermes1d$ time python a.py
>
> real 0m2.655s
> user 0m2.612s
> sys 0m0.044s
>
> ondra@pc232:~/repos/hermes1d$ time python b.py
>
> real 0m0.620s
> user 0m0.540s
> sys 0m0.080s
>
>
> So I guess that speaks for itself. Don't have time to dig more, but I
> think it's this line in scipy/integrate/__init__:
>
> __all__ = filter(lambda s:not s.startswith('_'),dir())
I don't see how. I can verify that it only gets executed once.
import sys
print 'scipy.integrate.__init__'
__all__ = filter(lambda s:sys.stdout.write('.') or not
s.startswith('_'),dir())
$ time python a.py
scipy.integrate.__init__
.............................python a.py 2.26s user 0.72s system 98%
cpu 3.009 total
Anyways, a million repetitions adds about 2s to your time (oh, you may
also want to do this a couple of times and wait for the times to
settle down in order get the same effect from the disk caches). That's
about 2 microseconds per import. The cost you are seeing in your
profile is much, much greater than 2 us * 96, so I think it really is
the first import that makes up the bulk of it.
--
Robert Kern
"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
-- Umberto Eco
More information about the Numpy-discussion
mailing list