[SciPy-User] line_profiler and for-loops !?
Tue Mar 16 08:58:57 CDT 2010
On Tue, Mar 16, 2010 at 11:27 AM, David Baddeley
> I think there are a couple of factors here - python for loops and if statements are bad news performance wise, so if you can remove them using cython / c you'll gain a lot. On the other hand they tend to suffer more of a performance penalty when you've got the line profiling hooks in place than, for example numpy calls which are vectorised and doing all the heavy lifting in places which aren't visible to the profiler.
> You might also be able to improve performance by doing something like:
> for tracki, track in enumerate(self.tracks[self.tracks_tlast == t-1]):
> if tracks and tracks_tlast are numpy arrays.
This did sound like an intriguing idea, but sadly they are all lists,
because they are appended to as new points are one-by-one being added
to the tracks.
> Just out of interest, how many points/observations are you trying to track? I've been working on something for stitching together tracks from lists of positions as well and would be interested in comparing strategies.
I just finished my first analysis of many of our files. In total I'm
talking about 300GB of image data, where each "movie" might have 100
to few thousand image frames. At most my detection algorithms found
few hundreds (but sometimes up to 1000) points per frame.
The longest run took about 3 hours for up to 1400 points per frame
over a total of 2000 frames; but here I might have also run into
memory / swapping(?) issues. (I have 6GB memory RAM on a quad-code
64bit linux machine.)
I would really like to try cython on this innermost loop to get
everything re-analysed in a fraction of a day.
> --- On Tue, 16/3/10, Sebastian Haase <firstname.lastname@example.org> wrote:
>> From: Sebastian Haase <email@example.com>
>> Subject: [SciPy-User] line_profiler and for-loops !?
>> To: "SciPy Users List" <SciPyfirstname.lastname@example.org>
>> Received: Tuesday, 16 March, 2010, 9:40 PM
>> I was starting to use Robert's line_profiler. I seems to
>> work great,
>> and I already found one easy way do half my execution
>> But now it claims that 33% of the time is spent (directly)
>> in the
>> "for"-line and another 36% in a very simple "if"-line. See
>> parts of
>> the output here:
>> Function: doTracing at line 1135
>> Total time: 23.9171 s
>> Line # Hits
>> Time Per Hit %
>> Time Line Contents
>> # iterate
>> over all tracks, and find close points
>> 1186 3853362
>> 8024186 2.1
>> tracki,track in enumerate(self.tracks):
>> 1187 3853063
>> 8639273 2.2
>> self.tracks_tLast[tracki] == t-1:
>> # track went on until t-1 (so far)
>> # -- otherwise, skip "old" tracks
>> 1190 62150
>> 130277 2.1
>> pi_t_1 = track[-1] # index in last time
>> (The object of the function is to connect closest points
>> found in an
>> image sequence into tracks connecting the points by
>> shortest steps.)
>> Anyhow, my question is, is this just an artifact of
>> line_profiler, or
>> is the fact that those two lines are hit almost 4e6 times
>> resulting in more than 50% of the time being spent here !?
>> (Calculating the actual Euclidean distance matrix over all
>> point pairs
>> takes supposedly only 15% of the time, for comparison).
>> I tried to separate out the "enumerate(self.tracks)" into a
>> line before the "for"-line, but the time spent was still
>> unchanged on
>> the "for".
>> Does this mean "python is slow" here - and I should try
>> cython (which
>> i have never done so far ...) ?
>> Sebastian Haase
>> SciPy-User mailing list
> SciPy-User mailing list
More information about the SciPy-User