[Numpy-discussion] Multi thread loading data
Thu Jul 2 10:38:54 CDT 2009
Who are quoting Sebastian?
Multiprocessing is a python package that spawns multiple python
processes, effectively side-stepping the GIL, and provides easy
mechanisms for IPC. Hence the need for serialization....
On Thu, Jul 2, 2009 at 11:30 AM, Sebastian Haase<firstname.lastname@example.org> wrote:
> On Thu, Jul 2, 2009 at 5:14 PM, Chris Colbert<email@example.com> wrote:
>> can you hold the entire file in memory as single array with room to spare?
>> If so, you could use multiprocessing and load a bunch of smaller
>> arrays, then join them all together.
>> It wont be super fast, because serializing a numpy array is somewhat
>> slow when using multiprocessing. That said, its still faster than disk
>> I'm sure some numpy expert will come on here though and give you a
>> much better idea.
>> On Wed, Jul 1, 2009 at 7:57 AM, Mag Gam<firstname.lastname@example.org> wrote:
>>> Is it possible to use loadtxt in a mult thread way? Basically, I want
>>> to process a very large CSV file (100+ million records) and instead of
>>> loading thousand elements into a buffer process and then load another
>>> 1 thousand elements and process and so on...
>>> I was wondering if there is a technique where I can use multiple
>>> processors to do this faster.
> Do you know about the GIL (global interpreter lock) in Python ?
> It means that Python isn't doing "real" multithreading...
> Only if one thread is e.g. doing some slow or blocking io stuff, the
> other thread could keep work, e.g. doing CPU-heavy numpy stuff.
> But you would get 2-CPU numpy code - except for some C-implemented
> "long running" operations -- these should be programmed in a way that
> releases the GIL so that the other CPU could go on doing it's Python
> Sebastian Haase
> Numpy-discussion mailing list
More information about the Numpy-discussion