[AstroPy] PyFITS 0.6.2 available

Perry Greenfield perry at stsci.edu
Thu Feb 28 12:23:34 CST 2002


Clive Page writes:

> Would you (or someone) like to post a comparison between PyFITS and
> pcFITSIO (see http://www.stecf.org/~npirzkal/)?   It's nice to see that
> Python and FITS are so successful that there are two competing packages of
> bindings between them, but eventually the astronomical community may find
> that's one too many.
>
I'm not sure I can go into a detailed comparison but I will outline
many of the reasons we decided to go a different way (albeit more costly
in development costs). I haven't used pCFITSIO so I may inadvertently
misrepresent some things; I hope Nor will correct me in such cases.

Basically CFITSIO is a fairly low-level interface while we desired a
more object-oriented interface to FITS files. It would
probably be possible to construct a low-level interface to CFITSIO (much
like pCFITSIO) with a more object-oriented wrapper library. But to
the extent the low-level library doesn't map well into the expected
behavior, that can become difficult. It depends on lots of details.
We wanted an interface that made the components of a FITS file have
corresponding objects in Python, and that accessing such objects would
not entail a huge drain on resources (either in time, I/O, or memory
usage). For example, it would be very nice to give headers much
of the capabilities that Python dictionaries have. Yet would we want
every dictionary access to result in I/O to the file containing the
header?

CFITIO is intrinsically file-oriented. Changes to the elements of
a FITS file may require an immediate rewriting of the entire file,
something we wanted to avoid. We wanted the interface to FITS objects
to have more memory-like (and hence more dynamic) behavior. To give
some specific examples:

1) If I want to add keywords to the header that cause it to need more
blocks that will cause (as I understand it) CFITIO to rewrite the
file to make space for it. If the header is in memory, that step
can be deferred until all such changes are made (vs. many rewrites
of the file). If we're forced to keep a copy of the header in memory
and provide means for updating it, we've already begun a significant
replacement of CFITSIO functionality. Besides, mapping all header
changes to file I/O can be quite inefficient.

2) If I want to insert HDU's into a FITS file, the wrapper library must
manage such HDU's as file objects rather than memory objects. This
requires that I/O be done for all such manipulations, even when
plenty of memory may be available for the inserted HDU's. It often
would be better to have in-memory HDU's and defer the I/O overhead
to the time when all the insertions and changes are completed.

3) It is not possible to memory map FITS files in CFITSIO. This is
a capability that we feel is important for minimizing memory usage
within Python. There really is no way of adding this to a wrapper
library without replacing CFITSIO.

4) CFITSIO does limited verification checking of the FITS files
written. We wish to see better mechanisms for preventing one from
writing inconsistent or illegal FITS files.

5) CFITSIO table access is relatively slow because of how table data
are buffered. We desired a faster means of creating arrays from
table columns.

It is the case that this version of PyFITS does not currently supply
all the functionality of CFITIO (e.g., ascii tables, random groups,
compression options, and extended filename syntax.) It is also true
that the current version provides no means of reading subsets of
large data sets (if you want the data for an HDU, it must all be
read into memory). This will change when memory mapping support is
added (and if there are platforms that do not support memory mapping
sufficiently well, or not at all, we will add means for reading
subsets of the data).

But all in all, I think people will find the PyFITS approach a much
easier-to-use and more flexible approach to dealing with FITS data
when working within Python.

We are using at STScI for developing new applications and calibration
pipelines so we have a strong commitment to long-term support. We
have also developed a new array module (numarray) to provide sufficient
capabilities for efficient and transparent access to FITS data that
was not present in the existing array module (Numeric). In the short
run this is a drawback since numarray does not yet have many of the
associated libraries to make it as useful as Numeric currently is,
but that will begin to change within a couple of months. (In fact,
much of the delay in making PyFITS public has been driven by how best
to provide data access. We concluded that we needed to get a workable
version of numarray to build on rather work with the restrictions
that Numeric would have required.)

To summarize in more user-oriented terms the difference between the
two approaches: with CFITSIO you must manage many of the details
yourself. Many things are cumbersome. Try inserting a keyword just
just after another keyword in the header. Try inserting an HDU into the
middle of an existing FITS file. Generally you will have to make
many calls to do either of these. Both are relatively easy in PyFITS.
In short, CFITSIO (or pCFITSIO) will require you to do a lot more
bookkeeping.

Mind you, I'm not trying to put down CFITSIO. It is very solid,
well supported, has many features, and is well documented.
If you are writing programs in C, it's clearly the library to use.
We just didn't think it mapped well to the Python/array-oriented
paradigm


> One basic question to which I could not find an answer in your web pages:
> does PyFITS use Bill Pence's cFITSIO library, or have you implemented all
> the required functionality independently?
>
No, it is not layered on CFITSIO, and yes all the functionality
present (not all "required" functionality is there as mentioned
in the announcement, e.g., ascii tables or random groups) is implemented
independently.

By the way, if many feel it is useful to provide access to FITS headers
without having numarray available (as mentioned in the subsequent email
by Andrew Williams), I don't believe that would be very hard for us to
implement (but access to data would certainly require numarray). PyFITS
itself is pure Python.

Perry Greenfield


_____________________________________________________
AstroPy mailing list  -  astropy at stsci.edu
http://lheawww.gsfc.nasa.gov/~bridgman/AstroPy/



More information about the AstroPy mailing list