[IPython-User] parallel IPython analysis with large dataset

Darren Govoni darren@ontrenet....
Mon Jul 9 22:01:25 CDT 2012


Interesting question.

Can you be more specific though? In my mind, any data that is analyzed
(by an algorithm)
needs to be brought into memory for operations to be done on it. You can
decide what
the smallest chunk of the data you can perform your algorithm/analysis
is and ipython
will handle the scattering/gathering etc. Which is to say, the model of
parallelization.

Some large data analysis can be "embarassingly parallel" in nature,
allowing itself to
be suitably distributed across machines using middleware like ipython.
Other forms
of analysis not so much. Few algorithms will work efficiently operating
on the data "at rest",
as that would incur an I/O bottleneck at each server destination which
could delay
other cores in the space from performing optimally. 

I think the latest iPython (and I defer to the authors on this too)
tries to keep the engines
as busy as possible while backfilling network traffic to them.

Darren

On Mon, 2012-07-09 at 19:33 -0700, Michael Kuhlen wrote:

> Hi
> 
> [Apologies if this is posted twice, I originally emailed from an
> unsubscribed email address.]
> 
> I have essentially the same question that was asked by Robert Ferrell
> back at the end of 2009,
> http://python.6.n6.nabble.com/Multi-processor-access-to-a-large-data-set-tt1657458.html
> 
> Is the answer still the same (use python.multiprocessing), or is this
> now possible with IPython parallel tools?
> 
> Specifically, is it now possible to analyze a large dataset using
> IPython parallel tools *without* replicating it in memory Ncore times?
> If yes, great! How would I do it?
> 
> Thanks for you help.
> 
> Mike
> _______________________________________________
> IPython-User mailing list
> IPython-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/ipython-user


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/ipython-user/attachments/20120709/554fb6f6/attachment.html 


More information about the IPython-User mailing list