[AstroPy] Co-ordinating Python astronomy libraries?

Joe Harrington jh@physics.ucf....
Tue Jul 6 17:01:20 CDT 2010

We've hit this topic (monolithic vs. Balkanized) several times over
the past 5 years or so in various forms.  Tom, you have hit many of
the main issues in your posting (licenses, maintenance, management
preferences, etc.).  However, what you are contemplating doing at the
end is unfortunately something that seems a disease: making Yet
Another Specialized Python Distro(TM).  There are already too many
Sages and EPDs, and take it from one trying to get SciPy documented,
managing a monolithic entity is a nightmare down the road when some of
the packages go off maintenance, yet everyone depends on them.

What I think we want from the users' perspective is an ability to say,
*in the terms of their system's native package manager*, any of:

give me this/these specific package(s)
give me this/these group(s) of packages
give me all the packages

Then, we can publish metapackages that depend on groups of add-ons by
topic (file format readers, coordinates/WCS/time, measurement
extraction, spectral modeling, orbits, planets, galaxies, stars, etc.)
and one that depends on all of the group packages, and viola, you can
have Sage, EPD, or whatever fall out as a trivial consequence.

What is critical here is something you did not mention in your
posting, and that is solving the current difficulty of building a good
Python stack from OS packages (e.g., debs), or even from package
sources.  The problem is that some of the packages do not play well
with others.  For example, HDF5 libraries were particularly
problematic a year ago, and often compiled code complained because two
different packages wanted different, specific versions of the same C
or FORTRAN add-on library.  Some of the code plain didn't work as
advertized, or at all.  In the end, it takes my very skilled system
manager more than a week to do it, each time we do it, which is about
once a year.

The root of the problem is that there was no centralized build and
test suite, nobody managing unified, integrated build testing and
resolving the problems with the code maintainers.

A year ago at SciPy'09, I pointed people to the build and test suite
NSF requires for all their software projects.  I think a few people
looked at it at the time.  Configured correctly (which takes work), it
will build packages on umpteen different linuxes and a few other
systems and package them in the native format.

I think that the extended SciPy stack as a whole should be organized
around such a system, but there seems to be little taste for it among
the developer-heavy SciPy leadership.  I think we can have a bit more
practical vision, and at least for our own stuff (loosely defined) we
should organize around principles that include these:

- build and test everything frequently
- manage the namespace so that all can play together
  - we really need to get everyone to agree to this or there will be
    conflicts, it's only a matter of time
- produce LPUs (least packageable units) in binary format for all OSes
- produce by topical meta-packages
- produce one or more mega-packages that are equivalent to Sage or EPD
- do it all for the native installers of all OSes
- provide and enforce standards for docs and licenses
- review the code
- plan together and put out RFPs for needed codes
- make sure procedures don't stifle innovators
- document the whole, but lightly
- provide a web community site for discussions, examples, reviews of code
- locate and manage it so that it is owned by the community and
  survives long-term
  - ensure all jobs doable by at least 3 people
  - document procedures well
  - have formalized community governance and leadership
  - have a solid funding model
- agree to hang together, even if you don't like something!

With those (and perhaps other) goals in mind, we should then look at
decisions like where/how to host it and what kind of wiki to use.

Also, I keep thinking that this is best solved by joining forces with
other scientific communities.  The build-and-test part is hard, but
once implemented, it scales fantastically.  Again, all of SciPy should
be doing this.  We should at least build so that if others want to
join, they have a place to fit in.  This will need to be considered
when doing package naming conventions.  Generic names should be
avoided so that two can play in the same sandbox.

Even if we don't do the full thing from the start, we should plan it
out and build as though that's where we're eventually going.

AAS splinter meeting, anyone?


More information about the AstroPy mailing list