[Numpy-discussion] multiprocessing shared arrays and numpy
Thu Mar 11 07:55:40 CST 2010
A Thursday 11 March 2010 14:35:49 Gael Varoquaux escrigué:
> > So, in my experience, numpy.memmap is really using that large chunk of
> > memory (unless my testbed is badly programmed, in which case I'd be
> > grateful if you can point out what's wrong).
> OK, so what you are saying is that my assertion #1 was wrong. Fair
> enough, as I was writing it I was thinking that I had no hard fact to
> back it. How about assertion #2? I can think only of this 'story' to
> explain why I can run parallel computation when I use memmap that blow up
> if I don't use memmap.
Well, I must tell that I've not experience about running memmapped arrays in
parallel computations, but it sounds like they can actually behave as shared-
memory arrays, so yes, you may definitely be right for #2, i.e. memmapped data
is not duplicated when accessed in parallel by different processes (in read-
only mode, of course), which is certainly a very interesting technique to
share data in parallel processes. Thanks for pointing out this!
> Also, could it be that the memmap mode changes things? I use only the 'r'
> mode, which is read-only.
I don't think so. When doing the computation, I open the x values in read-
only mode, and memory consumption is still there.
> This is all very interesting, and you have much more insights on these
> problems than me. Would you be interested in coming to Euroscipy in Paris
> to give a 1 or 2 hours long tutorial on memory and IO problems and how
> you address them with Pytables? It would be absolutely thrilling. I must
> warn that I am afraid that we won't be able to pay for your trip, though,
> as I want to keep the price of the conference low.
Yes, no problem. I was already thinking about presenting something at
EuroSciPy. A tutorial about PyTables/memory IO would be really great for me.
We can nail the details off-list.
More information about the NumPy-Discussion