[Numpy-discussion] Multi thread loading data

Sebastian Haase seb.haase@gmail....
Thu Jul 2 10:30:03 CDT 2009


On Thu, Jul 2, 2009 at 5:14 PM, Chris Colbert<sccolbert@gmail.com> wrote:
> can you hold the entire file in memory as single array with room to spare?
> If so, you could use multiprocessing and load a bunch of smaller
> arrays, then join them all together.
>
> It wont be super fast, because serializing a numpy array is somewhat
> slow when using multiprocessing. That said, its still faster than disk
> transfers.
>
> I'm  sure some numpy expert will come on here though and give you a
> much better idea.
>
>
>
> On Wed, Jul 1, 2009 at 7:57 AM, Mag Gam<magawake@gmail.com> wrote:
>> Is it possible to use loadtxt in a mult thread way? Basically, I want
>> to process a very large CSV file (100+ million records) and instead of
>> loading thousand elements into a buffer process and then load another
>> 1 thousand elements and process and so on...
>>
>> I was wondering if there is a technique where I can use multiple
>> processors to do this faster.
>>
>> TIA

Do you know about the GIL (global interpreter lock) in Python ?
It  means that Python isn't doing "real" multithreading...
Only if one thread is e.g. doing some slow or blocking io stuff, the
other thread could keep work, e.g. doing CPU-heavy numpy stuff.
But you would get 2-CPU numpy code - except for some C-implemented
"long running" operations -- these should be programmed in a way that
releases the GIL so that the other CPU could go on doing it's Python
code.

HTH,
Sebastian Haase


More information about the Numpy-discussion mailing list