[Numpy-discussion] Multi thread loading data
Thu Jul 2 12:08:53 CDT 2009
I'm relatively certain its possible, but then you have to deal with
locks, semaphores, synchronization, etc...
On Thu, Jul 2, 2009 at 12:04 PM, Sebastian Haase<email@example.com> wrote:
> On Thu, Jul 2, 2009 at 5:38 PM, Chris Colbert<firstname.lastname@example.org> wrote:
>> Who are quoting Sebastian?
>> Multiprocessing is a python package that spawns multiple python
>> processes, effectively side-stepping the GIL, and provides easy
>> mechanisms for IPC. Hence the need for serialization....
> I was replying to the OP's email
> Regarding your comment: can separate processes not access the same
> memory space !? via shared memory ...
> I think there was a discussion about this not to long ago on this list.
>> On Thu, Jul 2, 2009 at 11:30 AM, Sebastian Haase<email@example.com> wrote:
>>> On Thu, Jul 2, 2009 at 5:14 PM, Chris Colbert<firstname.lastname@example.org> wrote:
>>>> can you hold the entire file in memory as single array with room to spare?
>>>> If so, you could use multiprocessing and load a bunch of smaller
>>>> arrays, then join them all together.
>>>> It wont be super fast, because serializing a numpy array is somewhat
>>>> slow when using multiprocessing. That said, its still faster than disk
>>>> I'm sure some numpy expert will come on here though and give you a
>>>> much better idea.
>>>> On Wed, Jul 1, 2009 at 7:57 AM, Mag Gam<email@example.com> wrote:
>>>>> Is it possible to use loadtxt in a mult thread way? Basically, I want
>>>>> to process a very large CSV file (100+ million records) and instead of
>>>>> loading thousand elements into a buffer process and then load another
>>>>> 1 thousand elements and process and so on...
>>>>> I was wondering if there is a technique where I can use multiple
>>>>> processors to do this faster.
>>> Do you know about the GIL (global interpreter lock) in Python ?
>>> It means that Python isn't doing "real" multithreading...
>>> Only if one thread is e.g. doing some slow or blocking io stuff, the
>>> other thread could keep work, e.g. doing CPU-heavy numpy stuff.
>>> But you would get 2-CPU numpy code - except for some C-implemented
>>> "long running" operations -- these should be programmed in a way that
>>> releases the GIL so that the other CPU could go on doing it's Python
>>> Sebastian Haase
>>> Numpy-discussion mailing list
>> Numpy-discussion mailing list
> Numpy-discussion mailing list
More information about the Numpy-discussion