[Numpy-discussion] Possible roadmap addendum: building better text file readers

Lluís xscript@gmx....
Fri Mar 2 10:37:11 CST 2012


Frédéric Bastien writes:

> Hi,
> mmap can give a speed up in some case, but slow down in other. So care
> must be taken when using it. For example, the speed difference between
> read and mmap are not the same when the file is local and when it is
> on NFS. On NFS, you need to read bigger chunk to make it worthwhile.

> Another example is on an SMP computer. If for example you have a 8
> cores computer but have only enought ram for 1 or 2 copy of your
> dataset, using mmap is a bad idea. If you read the file by chunk
> normally the OS will keep the file in its cache in ram. So if you
> launch 8 jobs, they will all use the system cache to shared the data.
> If you use mmap, I think this bypass the OS cache. So you will always
> read the file.

Not according to mmap(2):

       MAP_SHARED Share this mapping.  Updates to the mapping are visible to
                  other processes that map this file, and are carried through to
                  the underlying file.  The file may not actually be updated
                  until msync(2) or munmap() is called.

My understanding is that all processes will use exactly the same physical
memory, and swapping that memory will use the file itself.


> On NFS with a cluster of computer, this can bring a
> high load on the file server. So having a way to specify to use or not
> to use mmap would be great as you can't always guess the right thing
> to do. (Except if I'm wrong and this don't by pass the OS cache)

> Anyway, it is great to see people work in this problem, this was just
> a few comments I had in mind when I read this thread.


Lluis

-- 
 "And it's much the same thing with knowledge, for whenever you learn
 something new, the whole world becomes that much richer."
 -- The Princess of Pure Reason, as told by Norton Juster in The Phantom
 Tollbooth


More information about the NumPy-Discussion mailing list