[Numpy-discussion] Multi thread loading data

Chris Colbert sccolbert@gmail....
Thu Jul 2 10:38:54 CDT 2009


Who are quoting Sebastian?

Multiprocessing is a python package that spawns multiple python
processes, effectively side-stepping the GIL, and provides easy
mechanisms for IPC. Hence the need for serialization....


On Thu, Jul 2, 2009 at 11:30 AM, Sebastian Haase<seb.haase@gmail.com> wrote:
> On Thu, Jul 2, 2009 at 5:14 PM, Chris Colbert<sccolbert@gmail.com> wrote:
>> can you hold the entire file in memory as single array with room to spare?
>> If so, you could use multiprocessing and load a bunch of smaller
>> arrays, then join them all together.
>>
>> It wont be super fast, because serializing a numpy array is somewhat
>> slow when using multiprocessing. That said, its still faster than disk
>> transfers.
>>
>> I'm  sure some numpy expert will come on here though and give you a
>> much better idea.
>>
>>
>>
>> On Wed, Jul 1, 2009 at 7:57 AM, Mag Gam<magawake@gmail.com> wrote:
>>> Is it possible to use loadtxt in a mult thread way? Basically, I want
>>> to process a very large CSV file (100+ million records) and instead of
>>> loading thousand elements into a buffer process and then load another
>>> 1 thousand elements and process and so on...
>>>
>>> I was wondering if there is a technique where I can use multiple
>>> processors to do this faster.
>>>
>>> TIA
>
> Do you know about the GIL (global interpreter lock) in Python ?
> It  means that Python isn't doing "real" multithreading...
> Only if one thread is e.g. doing some slow or blocking io stuff, the
> other thread could keep work, e.g. doing CPU-heavy numpy stuff.
> But you would get 2-CPU numpy code - except for some C-implemented
> "long running" operations -- these should be programmed in a way that
> releases the GIL so that the other CPU could go on doing it's Python
> code.
>
> HTH,
> Sebastian Haase
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


More information about the Numpy-discussion mailing list