[AstroPy] PyFITS and mmap

Erik Bray embray@stsci....
Fri Sep 23 10:49:22 CDT 2011

On 09/23/2011 08:37 AM, Paul Barrett wrote:
> Erik,
> The performance impact can be greater than you might think.  As an
> example, I have some Python code that uses subprocesses to divide the
> processing among eight or more processors.  The data is shared between
> the parent and child processes using memory-mapping.  The calculations
> take about 5 minutes per subprocess and then another 7 minutes or so
> to write the data to disk before the subprocess ends.  I would
> therefore prefer that memory-mapped files be an option instead of the
> default to avoid such a possible performance hit. If it is the
> default, there may be situations where the performance is poor and the
> novice user would not know why PyFITS is performing so poorly.  This
> adverse behavior may discourage users from using FITS files and
> instead use HDF5 files (i.e., the tables package), which, when I think
> about it, would be a good thing.

Like Tom wrote, this hardly seems like a novice use-case.  I mentioned 
in my previous e-mail the possibility of adding a pyfits.USE_MEMMAP 
variable to control the default behavior from one place (rather than 
having to change the arguments to all pyfits.open() calls).  In your 
case, you would want to set pyfits.USE_MEMMAP = False.

Still, this is valuable input.  I don't have strong opinions either way 
about what the default should be, which is why I asked.  We also had a 
few use cases come up here of heavily I/O-bound use of PyFITS where mmap 
might not be appropriate.  I think it mostly comes down to what would 
work best for the average user.

And yes, for these large datasets they really should be using HDF5 and 
PyTables.  I've done quite a bit to improve the performance of PyFITS, 
but the FITS format was never designed for such large datasets, and 
there's only so much that can be done.  It would be foolish to try to 
recreate PyTables on top of FITS and without libhdf5 :)


More information about the AstroPy mailing list