[Numpy-discussion] parallel numpy (by Brian Granger) - any info?
Tue Jan 8 04:33:23 CST 2008
> I have AMD processor so I guess I should use ACML somehow instead.
> However, at 1st I would prefer my code to be platform-independent, and
> at 2nd unfortunately I haven't encountered in numpy documentation (in
> website scipy.org and numpy.scipy.org) any mention about how to use
> numpy multithreading at all (neither MKL nor ACML).
MKL does the multithreading on its own for level 3 BLAS instructions
(OpenMP). For ACML, the problem is that AMD does not provide a CBLAS
interface and is not interested in doing so. With ACML, the compilation
fails with the current Numpy, but hopefully with Scons it will work, at
least for the LAPACK part. But I don't think that ACML is parallel.
I think that using multithreaded libraries is far more interesting and easy
to do than using distributed memory systems. This is due to the fact that
Python can use some help to enable multi-processing (not GIL), for instance
like Java and Jackal. After some readings, I think this means that the core
Python should be updated.
Also, I intended to try using numpy multithreading on our icyb cluster
> (IIRC made of intel processors) from world top 500 (however, currently
> connect to other subsets of processors from other cities have been
> organized, some of them are AMD). Would 100-200 processors (I don't
> remember how many have the one) yield at least 2x...3x speedup on some
> of my test cases, it would be a good deal and something to report in my
> graduation work.
If you have access to Intel Quad-Core processors with the latest MKL and if
you intensively use matrix multiplications, you will have those results. But
if you speak at your graduation that using 100 or 200 processors and say
that it only yields a 2 or 3 time speedup factor, I think the jury will not
As my chief informed me, people here are fond of the cluster, mentioning
> the magical word (in my work) would fill them with respect :)
Then you should first start by looking how to make your algorithms parallel.
Just throwing a number of processors will not yield a good speedup per
processor, and this is what people are looking for : good scalability. Then
you must use tools like the processing module, MPI, ...
French PhD student
Website : http://matthieu-brucher.developpez.com/
Blogs : http://matt.eifelle.com and http://blog.developpez.com/?blog=92
LinkedIn : http://www.linkedin.com/in/matthieubrucher
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Numpy-discussion