[Numpy-discussion] How to limit the numpy.memmap's RAM usage?

Charles R Harris charlesr.harris@gmail....
Sat Oct 23 11:24:10 CDT 2010

On Sat, Oct 23, 2010 at 10:15 AM, Charles R Harris <
charlesr.harris@gmail.com> wrote:

> On Sat, Oct 23, 2010 at 9:44 AM, braingateway <braingateway@gmail.com>wrote:
>> David Cournapeau :
>>  2010/10/23 braingateway <braingateway@gmail.com>:
>>>> Hi everyone,
>>>> I noticed the numpy.memmap using RAM to buffer data from memmap files.
>>>> If I get a 100GB array in a memmap file and process it block by block,
>>>> the RAM usage is going to increasing with the process running until
>>>> there is no available space in RAM (4GB), even though the block size is
>>>> only 1MB.
>>>> for example:
>>>> ####
>>>> a = numpy.memmap(‘a.bin’, dtype='float64', mode='r')
>>>> blocklen=1e5
>>>> b=npy.zeros((len(a)/blocklen,))
>>>> for i in range(0,len(a)/blocklen):
>>>> b[i]=npy.mean(a[i*blocklen:(i+1)*blocklen])
>>>> ####
>>>> Is there any way to restrict the memory usage in numpy.memmap?
>>> The whole point of using memmap is to let the OS do the buffering for
>>> you (which is likely to do a better job than you in many cases). Which
>>> OS are you using ? And how do you measure how much memory is taken by
>>> numpy for your array ?
>>> David
>>> _______________________________________________
>> Hi David,
>> I agree with you about the point of using memmap. That is why the behavior
>> is so strange to me.
>> I actually measure the size of resident set (pink trace in figure2) of the
>> python process on Windows. Here I attached the  result. You can see the  RAM
>>  usage is definitely not file system cache.
> Umm, a good operating system will use *all* of ram for buffering because
> ram is fast and it assumes you are likely to reuse data you have already
> used once. If it needs some memory for something else it just writes a page
> to disk, if dirty, and reads in the new data from disk and changes the
> address of the page. Where you get into trouble is if pages can't be evicted
> for some reason. Most modern OS's also have special options available for
> reading in streaming data from disk that can lead to significantly faster
> access for that sort of thing, but I don't think you can do that with
> memmapped files.
> I'm not sure how windows labels it's memory. IIRC, Memmaping a file leads
> to what is called file backed memory, it is essentially virtual memory. Now,
> I won't bet my life that there isn't a problem, but I think a
> misunderstanding of the memory information is more likely.
It is also possible that something else in your program is hanging onto
memory but without knowing a lot more it is hard to tell. Are you seeing
symptoms besides the memory graphs? It looks like you aren't running on
windows, actually, so what OS are you running on?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20101023/0f78b644/attachment.html 

More information about the NumPy-Discussion mailing list