[Numpy-discussion] How to limit the numpy.memmap's RAM usage?

braingateway braingateway@gmail....
Sat Oct 23 11:27:53 CDT 2010


Charles R Harris :
>
>
> On Sat, Oct 23, 2010 at 10:15 AM, Charles R Harris 
> <charlesr.harris@gmail.com <mailto:charlesr.harris@gmail.com>> wrote:
>
>
>
>     On Sat, Oct 23, 2010 at 9:44 AM, braingateway
>     <braingateway@gmail.com <mailto:braingateway@gmail.com>> wrote:
>
>         David Cournapeau :
>
>             2010/10/23 braingateway <braingateway@gmail.com
>             <mailto:braingateway@gmail.com>>:
>              
>
>                 Hi everyone,
>                 I noticed the numpy.memmap using RAM to buffer data
>                 from memmap files.
>                 If I get a 100GB array in a memmap file and process it
>                 block by block,
>                 the RAM usage is going to increasing with the process
>                 running until
>                 there is no available space in RAM (4GB), even though
>                 the block size is
>                 only 1MB.
>                 for example:
>                 ####
>                 a = numpy.memmap(‘a.bin’, dtype='float64', mode='r')
>                 blocklen=1e5
>                 b=npy.zeros((len(a)/blocklen,))
>                 for i in range(0,len(a)/blocklen):
>                 b[i]=npy.mean(a[i*blocklen:(i+1)*blocklen])
>                 ####
>                 Is there any way to restrict the memory usage in
>                 numpy.memmap?
>                    
>
>
>             The whole point of using memmap is to let the OS do the
>             buffering for
>             you (which is likely to do a better job than you in many
>             cases). Which
>             OS are you using ? And how do you measure how much memory
>             is taken by
>             numpy for your array ?
>
>             David
>             _______________________________________________
>              
>
>         Hi David,
>
>         I agree with you about the point of using memmap. That is why
>         the behavior is so strange to me.
>         I actually measure the size of resident set (pink trace in
>         figure2) of the python process on Windows. Here I attached the
>          result. You can see the  RAM  usage is definitely not file
>         system cache.
>
>
>     Umm, a good operating system will use *all* of ram for buffering
>     because ram is fast and it assumes you are likely to reuse data
>     you have already used once. If it needs some memory for something
>     else it just writes a page to disk, if dirty, and reads in the new
>     data from disk and changes the address of the page. Where you get
>     into trouble is if pages can't be evicted for some reason. Most
>     modern OS's also have special options available for reading in
>     streaming data from disk that can lead to significantly faster
>     access for that sort of thing, but I don't think you can do that
>     with memmapped files.
>
>     I'm not sure how windows labels it's memory. IIRC, Memmaping a
>     file leads to what is called file backed memory, it is essentially
>     virtual memory. Now, I won't bet my life that there isn't a
>     problem, but I think a misunderstanding of the memory information
>     is more likely.
>
>
> It is also possible that something else in your program is hanging 
> onto memory but without knowing a lot more it is hard to tell. Are you 
> seeing symptoms besides the memory graphs? It looks like you aren't 
> running on windows, actually, so what OS are you running on?
>
> Chuck
> ------------------------------------------------------------------------
>
>   
Hi Chuck,

Thanks a lot for quick response. I do run following supper simple script 
on windows:

####
a = numpy.memmap(‘a.bin’, dtype='float64', mode='r')
blocklen=1e5
b=npy.zeros((len(a)/blocklen,))
for i in range(0,len(a)/blocklen):
b[i]=npy.mean(a[i*blocklen:(i+1)*blocklen])
####
Everything became supper slow after python ate all the RAM.
By the way, I also tried Qt QFile::map() there is no problem at all...

LittleBigBrain


More information about the NumPy-Discussion mailing list