[Numpy-discussion] [matplotlib-devel] Announcing toydist, improving distribution and packaging situation

Nathaniel Smith njs@pobox....
Sun Jan 3 17:42:32 CST 2010

On Sun, Jan 3, 2010 at 4:23 AM, David Cournapeau <cournape@gmail.com> wrote:
> On Sun, Jan 3, 2010 at 8:05 PM, Nathaniel Smith <njs@pobox.com> wrote:
>> What I do -- and documented for people in my lab to do -- is set up
>> one virtualenv in my user account, and use it as my default python. (I
>> 'activate' it from my login scripts.) The advantage of this is that
>> easy_install (or pip) just works, without any hassle about permissions
>> etc.
> It just works if you happen to be able to build everything from
> sources. That alone means you ignore the majority of users I intend to
> target.
> No other community (except maybe Ruby) push those isolated install
> solutions as a general deployment solutions. If it were such a great
> idea, other people would have picked up those solutions.

AFAICT, R works more-or-less identically (once I convinced it to use a
per-user library directory); install.packages() builds from source,
and doesn't automatically pull in and build random C library

I'm not advocating the 'every app in its own world' model that
virtualenv's designers had min mind, but virtualenv is very useful to
give each user their own world. Normally I only use a fraction of
virtualenv's power this way, but sometimes it's handy that they've
solved the more general problem -- I can easily move my environment
out of the way and rebuild if I've done something stupid, or
experiment with new python versions in isolation, or whatever. And
when you *do* have to reproduce some old environment -- if only to
test that the new improved environment gives the same results -- then
it's *really* handy.

>> This should be easier, but I think the basic approach is sound.
>> "Integration with the package system" is useless; the advantage of
>> distribution packages is that distributions can provide a single
>> coherent system with consistent version numbers across all packages,
>> etc., and the only way to "integrate" with that is to, well, get the
>> packages into the distribution.
> Another way is to provide our own repository for a few major
> distributions, with automatically built packages. This is how most
> open source providers work. Miguel de Icaza explains this well:
> http://tirania.org/blog/archive/2007/Jan-26.html
> I hope we will be able to reuse much of the opensuse build service
> infrastructure.

Sure, I'm aware of the opensuse build service, have built third-party
packages for my projects, etc. It's a good attempt, but also has a lot
of problems, and when talking about scientific software it's totally
useless to me :-). First, I don't have root on our compute cluster.
Second, even if I did I'd be very leery about installing third-party
packages because there is no guarantee that the version numbering will
be consistent between the third-party repo and the real distro repo --
suppose that the distro packages 0.1, then the third party packages
0.2, then the distro packages 0.3, will upgrades be seamless? What if
the third party screws up the version numbering at some point? Debian
has "epochs" to deal with this, but third-parties can't use them and
maintain compatibility. What if the person making the third party
packages is not an expert on these random distros that they don't even
use? Will bug reporting tools work properly? Distros are complicated.
Third, while we shouldn't advocate that people screw up backwards
compatibility, version skew is a real issue. If I need one version of
a package and my lab-mate needs another and we have submissions due
tomorrow, then filing bugs is a great idea but not a solution. Fourth,
even if we had expert maintainers taking care of all these third-party
packages and all my concerns were answered, I couldn't convince our
sysadmin of that; he's the one who'd have to clean up if something
went wrong we don't have a big budget for overtime.

Let's be honest -- scientists, on the whole, suck at IT
infrastructure, and small individual packages are not going to be very
expertly put together. IMHO any real solution should take this into
account, keep them sandboxed from the rest of the system, and focus on
providing the most friendly and seamless sandbox possible.

>> On another note, I hope toydist will provide a "source prepare" step,
>> that allows arbitrary code to be run on the source tree. (For, e.g.,
>> cython->C conversion, ad-hoc template languages, etc.) IME this is a
>> very common pain point with distutils; there is just no good way to do
>> it, and it has to be supported in the distribution utility in order to
>> get everything right. In particular:
>>  -- Generated files should never be written to the source tree
>> itself, but only the build directory
>>  -- Building from a source checkout should run the "source prepare"
>> step automatically
>>  -- Building a source distribution should also run the "source
>> prepare" step, and stash the results in such a way that when later
>> building the source distribution, this step can be skipped. This is a
>> common requirement for user convenience, and necessary if you want to
>> avoid arbitrary code execution during builds.
> Build directories are hard to implement right. I don't think toydist
> will support this directly. IMO, those advanced builds warrant a real
> build tool - one main goal of toydist is to make integration with waf
> or scons much easier. Both waf and scons have the concept of a build
> directory, which should do everything you described.

Maybe I was unclear -- proper build directory handling is nice,
Cython/Pyrex's distutils integration get it wrong (not their fault,
distutils is just impossible to do anything sensible with, as you've
said), and I've never found build directories hard to implement
(perhaps I'm missing something). But what I'm really talking about is
having a "pre-build" step that integrates properly with the source and
binary packaging stages, and that's not something waf or scons have
any particular support for, AFAIK.

-- Nathaniel

More information about the NumPy-Discussion mailing list