[Numpy-discussion] is numerical python only for prototyping ?

Peter Verveer verveer at embl-heidelberg.de
Tue Jul 26 10:15:15 CDT 2005


On 26 Jul 2005, at 18:41, Sebastian Haase wrote:

> Hi,
> This is not sopposed to be an evil question; instead I'm hoping for the
> answer: "No, generally we get >=95% the speed of a pure C/fortran
> implementation" ;-)

you won't, generally. Question is: since you are certainly not going to 
gain an order of a magnitude doing it in C,  do you really care?

> But as I am the strongest Python/numarray advocate in our group I get 
> often
> the answer that Matlab is (of course) also very convenient but 
> generally
> memory handling and overall execution performance is so bad that for 
> final
> implementation one would generally have to reimplement in C.

Well its true that implementations in C will be faster. And memory 
handling in Numeric/numarray can be a pain since the tendency is to 
create and destroy a lot of arrays if you are not careful.

> We are a bio-physics group at UCSF developping new algorithms for
> deconvolution (often in 3D). Our data sets are regularly bigger than 
> several
> 100MB.  When deciding for numarray I was assuming that the "Hubble 
> Crowd" had
> a similar situation and all the operations are therefore very much 
> optimized
> for this type of data.

Funny you mention that example. I did my PhD in exactly the same field 
(considering you are from Sedats lab I guess you are in exactly the 
same field as I was/am, i.e. fluorescence microscopy. What are you guys 
up to these days?) and I developed all my algorithms in C at the time. 
Now, about 7 years later, I returned to the field to re-implement and 
extend some of my old algorithms for use with microscopy data that can 
consist of multiple sets, each several 100MB at least. Now I use python 
with numarray, and I am actualy quite happy with.  I am pushing it by 
using up to 2GB of memory, (per  process, after splitting the problem 
up and distributing it on  a cluster...), but it works. I am sure I 
could squeeze maybe a factor of two or three in terms of speed and 
memory usage by rewriting in C, but that is currently not worth my 
time. So I guess that counts as using numarray as a prototyping 
environment, but the result is also suitable for production use.

> Is 95% a reasonable number to hope for ?  I did wrap my own version of 
> FFTW
> (with "plan-caching"), which should give 100% of the C-speed.

That should help a lot, as the standard FFTs that come with 
Numarray/Numeric suck big time. I do use them, but have to go through 
all kind of tricks to get some decent memory usage in 32bit floating 
point. The FFT array module is in fact very badly written for use with 
large multi-dimensional data sets.

> But concerns
> arise from expression like "a=b+c*a" (think "convenience"!): If a,b,c 
> are
> each 3D-datastacks creation of temporary data-arrays for 'c*a' AND 
> then also
> for 'b+...' would have to be very costly. (I think this is at least 
> happening
> for Numeric - I don't know about Matlab and numarray)

That is indeed a problem, although I think in your case you maybe 
limited by your FFTs anyway, at least in terms of speed. One thing you 
should consider is replacing expressions such as ' c= a + b' with 
add(a, b, c). If you do that cleverly you can avoid quite some memory 
allocations and you 'should' get closer to C. That does not solve 
everything though:

1) Complex expressions still need to be broken up in  sequences of 
operations which is likely slower then iterating once over you array 
and do the expression at each point.
2) Unfortunately not all numarray functions support an output array 
(maybe only the ufuncs?). This can be a big problem, as then temporary 
arrays must be allocated. (It sure was a problem for me.)

You can of course always re-implement the parts that are critical in C 
and wrap them (as you did with FFTW). In fact, I think numarray now 
provides a relatively easy way to write ufuncs which would allow you to 
write a single python function in C for complex expressions.

> Hoping for comments,

Hope this gives some insights. I guess I have had similar experiences, 
there are definitely some limits to the use of numarray/Numeric that 
could be relieved, for instance by having a consistent implementation 
of output arrays. That would allow writing algorithms where you could 
very strictly control the allocation and de-allocation of arrays, which 
would be a big help for working with large arrays.

Cheers, Peter

PS.
I would not mind to hear a bit about your experiences doing the big 
deconvolutions in numarray/Numeric, but that may not be a good topic 
for this list.

> Thanks
> Sebastian Haase
> UCSF, Sedat Lab
>
--
Dr Peter J Verveer
European Molecular Biology Laboratory
Meyerhofstrasse 1
D-69117 Heidelberg
Germany
Tel. +49 6221 387 8245
Fax. +49 6221 397 8306





More information about the Numpy-discussion mailing list