[SciPy-user] Pros and Cons of Python verses other array environments

David Cournapeau david at ar.media.kyoto-u.ac.jp
Sat Sep 30 06:54:06 CDT 2006


Gary Ruben wrote:
> Hi David,
> I've never found profiling a problem, especially using prun in 
> ipython. Have you tried this?
I didn't know prun, but it looks like it is doing some profiling with 
hotshot. For example, on one of my package, I get:


      300   48.510    0.162   48.510    0.162 
densities.py:109(_diag_gauss_den)
       10    7.500    0.750   59.930    5.993 
gmm_em.py:122(sufficient_statistics)
       10    6.470    0.647   13.820    1.382 gmm_em.py:143(update_em)
     1050    6.310    0.006    6.310    0.006 :0(dot)
       10    3.150    0.315   52.070    5.207 
gmm_em.py:233(multiple_gauss_den)
       51    1.740    0.034    1.740    0.034 :0(sum)
      600    1.510    0.003    1.510    0.003 :0(where)
        5    0.810    0.162    0.810    0.162 :0(double_vq)
        1    0.650    0.650    2.030    2.030 kmean.py:47(kmean)
        1    0.580    0.580    3.780    3.780 gmm_em.py:62(init_kmean)

etc...

Basically, I know that sufficient_statistics is the cullprit, and I know 
it is because of _diag_gauss_densities (this last point can only be 
known if you read the code, though, but I know this code quite well, as 
it is mine: ) ). But now, how can I optimize this function ? The dot, 
sum are called everywhere through the code, so I don't know which call 
are expensive where (all calls are not done with the same args, for 
example).

So in my experience, this is not enough. In matlab, once you do 
profiling, you can generate a really nice report in the form of one html 
file, and it gives you the time taken by all your code, per line (for 
the lines which matter). For people not familiar with matlab, I put an 
example here:

http://www.ar.media.kyoto-u.ac.jp/members/david/profile_results/

(this is the new version of matlab we have at the lab; I am not familiar 
with it, it is much more fancy than what I need and what used to be on 
older matlab versions, but this should give you an idea)

You have first an index on all top level functions, and you can dig it 
through as deep as you want. Notice how you know for a given function 
which call are called when and how often. I have no idea how difficult 
this would be to implement in python. I was told some months ago on the 
main python list that hotshot can give a per line profiling of python 
code, but this is not documented; also, it looks like it is possible to 
get the source code at runtime without too much difficulty in python. I 
would be really surprised if nobody tried to do something similar for 
python in general, because this is really useful. I have never found 
anything for python, but it may be just because I don't know the name 
for this kind of tools (I tried googling with terms such as "source 
profiling", without much success).

David



More information about the SciPy-user mailing list