<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 TRANSITIONAL//EN">
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=UTF-8">
<META NAME="GENERATOR" CONTENT="GtkHTML/3.28.3">
Can you be more specific though? In my mind, any data that is analyzed (by an algorithm)<BR>
needs to be brought into memory for operations to be done on it. You can decide what<BR>
the smallest chunk of the data you can perform your algorithm/analysis is and ipython<BR>
will handle the scattering/gathering etc. Which is to say, the model of parallelization.<BR>
Some large data analysis can be "embarassingly parallel" in nature, allowing itself to<BR>
be suitably distributed across machines using middleware like ipython. Other forms<BR>
of analysis not so much. Few algorithms will work efficiently operating on the data "at rest",<BR>
as that would incur an I/O bottleneck at each server destination which could delay<BR>
other cores in the space from performing optimally. <BR>
I think the latest iPython (and I defer to the authors on this too) tries to keep the engines<BR>
as busy as possible while backfilling network traffic to them.<BR>
On Mon, 2012-07-09 at 19:33 -0700, Michael Kuhlen wrote:
[Apologies if this is posted twice, I originally emailed from an
unsubscribed email address.]
I have essentially the same question that was asked by Robert Ferrell
back at the end of 2009,
Is the answer still the same (use python.multiprocessing), or is this
now possible with IPython parallel tools?
Specifically, is it now possible to analyze a large dataset using
IPython parallel tools *without* replicating it in memory Ncore times?
If yes, great! How would I do it?
Thanks for you help.
IPython-User mailing list