[Numpy-discussion] numpy : your experiences?
Sat Nov 17 01:07:34 CST 2007
On 16/11/2007, Rahul Garg <email@example.com> wrote:
> It would be awesome if you guys could respond to some of the following
> questions :
> a) Can you guys tell me briefly about the kind of problems you are
> tackling with numpy and scipy?
> b) Have you ever felt that numpy/scipy was slow and had to switch to
> c) Do you use any form of parallel processing? Multicores? SMPs?
> Clusters? If yes how did u utilize them?
> If you feel its not relevant to the list .. feel free to email me personally.
> I would be very interested in talking about these issues.
I think it would be interesting and on-topic to hear a few words from
people to see what they do with numpy.
a) I use python/numpy/scipy to work with astronomical observations of
pulsars. This includes a range of tasks including: simple scripting to
manage jobs on our computation cluster; minor calculations (like a
better scientific calculator, though frink is sometimes better because
it keeps track of units); gathering and plotting results; prototyping
search algorithms and evaluating their statistical properties;
providing tools for manipulating photon data; various other tasks. We
also use the software package PRESTO for much of the heavy lifting;
much of it is written in python.
b) I have projects for which python is too slow, yes. Pulsar surveys
are extremely compute-intensive (I estimated one that I'm involved
with at two or three mega-core-hours), so any software that is going
in the pipeline should have its programming-time/runtime tradeoff
carefully examined. I wrote a search code in C, with bits in embedded
assembler. All the driver code to shuffle files around and supply
correct parameters is in python, though. PRESTO has a similar pattern,
with more C code because it does more of the hard work. In most cases
the communication between the heavy-lifting code and the python code
is through the UNIX environment (the heavy-lifting code gets run as a
separate executable) but PRESTO makes many functions available in
python modules. On the other hand, I often write quick Monte Carlo
simulations that run for a day or more, but since writing them takes
about as long as running them, it's not worth writing them in a
language that would run faster.
c) Our problems tend to be embarrassingly parallel, so we tend not to
use clever parallelism toolkits. For the survey I am working on, I
wrote one (python) script to process a single beam on a single node
(which takes about thirty hours), and another to keep the batch queue
filled with an appropriate number of such jobs. I have thought about
writing a more generic tool for managing this kind of job queue, but
haven't invested the time yet.
More information about the Numpy-discussion