[AstroPy] PyFITS and mmap

Paul Barrett pebarrett@gmail....
Fri Sep 23 07:37:07 CDT 2011


Erik,

The performance impact can be greater than you might think.  As an
example, I have some Python code that uses subprocesses to divide the
processing among eight or more processors.  The data is shared between
the parent and child processes using memory-mapping.  The calculations
take about 5 minutes per subprocess and then another 7 minutes or so
to write the data to disk before the subprocess ends.  I would
therefore prefer that memory-mapped files be an option instead of the
default to avoid such a possible performance hit. If it is the
default, there may be situations where the performance is poor and the
novice user would not know why PyFITS is performing so poorly.  This
adverse behavior may discourage users from using FITS files and
instead use HDF5 files (i.e., the tables package), which, when I think
about it, would be a good thing.

 -- Paul

On Thu, Sep 22, 2011 at 12:21 PM, Erik Bray <embray@stsci.edu> wrote:
> Hi all,
>
> Every now and then PyFITS gets support requests from people trying to
> work with very large FITS files (>4GB; I've seen as high as 50 GB) and
> having trouble when they run out of memory.
>
> Normally I point them to the memmap=True option to pyfits.open(), and
> that works for them.  On 64-bit systems in particular there's more than
> enough virtual address space to mmap very large files.
>
> And I got to thinking that while most FITS files I encounter are not
> many gigabytes in size, they are still over 100 MB.  And there are only
> so many operations that actually require having an entire array in
> memory at once.  So maybe it would make sense to have PyFITS use mmap by
> default.
>
> There could be some slight performance implications here: For example,
> when reading the data a little bit a time mmap is a little a bit slower,
> unsurprisingly.  But in practice I don't think it's a very noticeable
> difference, and the benefit--far less memory usage and more transparent
> support for large files--outweigh any drawbacks I can think of.
>
> I'm just putting this out there because I wonder if there are any other
> downsides to this that I'm not thinking of.
>
> Thanks,
> Erik
> _______________________________________________
> AstroPy mailing list
> AstroPy@scipy.org
> http://mail.scipy.org/mailman/listinfo/astropy
>


More information about the AstroPy mailing list