[IPython-User] parallel IPython analysis with large dataset
Mon Jul 9 22:01:25 CDT 2012
Can you be more specific though? In my mind, any data that is analyzed
(by an algorithm)
needs to be brought into memory for operations to be done on it. You can
the smallest chunk of the data you can perform your algorithm/analysis
is and ipython
will handle the scattering/gathering etc. Which is to say, the model of
Some large data analysis can be "embarassingly parallel" in nature,
allowing itself to
be suitably distributed across machines using middleware like ipython.
of analysis not so much. Few algorithms will work efficiently operating
on the data "at rest",
as that would incur an I/O bottleneck at each server destination which
other cores in the space from performing optimally.
I think the latest iPython (and I defer to the authors on this too)
tries to keep the engines
as busy as possible while backfilling network traffic to them.
On Mon, 2012-07-09 at 19:33 -0700, Michael Kuhlen wrote:
> [Apologies if this is posted twice, I originally emailed from an
> unsubscribed email address.]
> I have essentially the same question that was asked by Robert Ferrell
> back at the end of 2009,
> Is the answer still the same (use python.multiprocessing), or is this
> now possible with IPython parallel tools?
> Specifically, is it now possible to analyze a large dataset using
> IPython parallel tools *without* replicating it in memory Ncore times?
> If yes, great! How would I do it?
> Thanks for you help.
> IPython-User mailing list
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the IPython-User