[AstroPy] PyFITS and mmap

Erik Bray embray@stsci....
Fri Sep 23 10:36:46 CDT 2011


On 09/22/2011 10:39 PM, James Turner wrote:
> This probably depends on the details, but if data arrays are mapped
> fairly transparently and operations are just a "little bit slower",
> without the danger of exhausting memory and/or making the OS swap,
> that certainly sounds like a net gain to me.

Technically, when reading pieces of a mmap'd file into physical RAM, 
swapping is *exactly* what's going on, just not to/from your OS's main 
pagefile :)

> I assume there will be cases where it's not quite so simple and
> things have to be kept in memory for specific performance reasons
> or the working directory isn't writeable or whatever, but it seems
> like a reasonable default. I don't have enough practical experience
> with memory mapping to answer your question about downsides you
> haven't thought of, but since you're testing the waters (and no-one
> has commented yet) I thought I'd throw out my initial user reaction.
> For what it's worth, we HAVE recently run into situations at Gemini
> where we have exhausted 4Gb of RAM, typical of an end user machine,
> and started discussing memory mapping. We're also not dealing with
> files larger than 200Mb or so.

Right--on large programs on 32-bit systems even smaller files can be 
problematic to mmap since it requires a contiguous address space, which 
may not be possible to find if the memory is fairly fragmented.  On 
64-bit systems (just about everything anymore, though my laptop is still 
32-bit :) this is much less likely to be a problem.

> AFAICT, PyFITS doesn't do this by default just because not that
> long ago it was running mainly on 32-bit systems (I remember
> discussing it at the time and was told it would be more useful in
> future, which is now).
>
> Seems like some limited user testing would be in order first?
>
> Cheers,
>
> James.

I could try turning it on here at STScI and see if any problems arise. 
Warren and I also discussed adding a global default--something like 
pyfits.USE_MEMMAP--that can be used to easily control the default for 
all pyfits.open() calls.

Thanks,
Erik

>
>> Hi all,
>>
>> Every now and then PyFITS gets support requests from people trying to
>> work with very large FITS files (>4GB; I've seen as high as 50 GB) and
>> having trouble when they run out of memory.
>>
>> Normally I point them to the memmap=True option to pyfits.open(), and
>> that works for them.  On 64-bit systems in particular there's more than
>> enough virtual address space to mmap very large files.
>>
>> And I got to thinking that while most FITS files I encounter are not
>> many gigabytes in size, they are still over 100 MB.  And there are only
>> so many operations that actually require having an entire array in
>> memory at once.  So maybe it would make sense to have PyFITS use mmap by
>> default.
>>
>> There could be some slight performance implications here: For example,
>> when reading the data a little bit a time mmap is a little a bit slower,
>> unsurprisingly.  But in practice I don't think it's a very noticeable
>> difference, and the benefit--far less memory usage and more transparent
>> support for large files--outweigh any drawbacks I can think of.
>>
>> I'm just putting this out there because I wonder if there are any other
>> downsides to this that I'm not thinking of.
>>
>> Thanks,
>> Erik
>> _______________________________________________
>> AstroPy mailing list
>> AstroPy@scipy.org
>> http://mail.scipy.org/mailman/listinfo/astropy
> _______________________________________________
> AstroPy mailing list
> AstroPy@scipy.org
> http://mail.scipy.org/mailman/listinfo/astropy


More information about the AstroPy mailing list