[SciPy-Dev] Cython as build dependency, file/dll size and current issues

Ralf Gommers ralf.gommers@googlemail....
Thu Jul 5 14:37:30 CDT 2012


On Thu, Jul 5, 2012 at 9:26 PM, Matthew Brett <matthew.brett@gmail.com>wrote:

> Hi,
>
> On Thu, Jul 5, 2012 at 12:18 PM, Ralf Gommers
> <ralf.gommers@googlemail.com> wrote:
> >
> >
> > On Thu, Jul 5, 2012 at 8:57 PM, Matthew Brett <matthew.brett@gmail.com>
> > wrote:
> >>
> >> Hi,
> >>
> >> On Thu, Jul 5, 2012 at 11:35 AM, Ralf Gommers
> >> <ralf.gommers@googlemail.com> wrote:
> >> >
> >> >
> >> > On Thu, Jul 5, 2012 at 8:31 PM, Matthew Brett <
> matthew.brett@gmail.com>
> >> > wrote:
> >> >>
> >> >> Hi,
> >> >>
> >> >> On Thu, Jul 5, 2012 at 11:25 AM, Ralf Gommers
> >> >> <ralf.gommers@googlemail.com> wrote:
> >> >> > Hi all,
> >> >> >
> >> >> > On https://github.com/scipy/scipy/pull/261 the problem with large
> >> >> > size
> >> >> > of
> >> >> > generated C files from Cython came up again, and Matthew suggested
> to
> >> >> > add
> >> >> > Cython as a build time dependency. He also pointed out that this
> was
> >> >> > discussed before, with most people being in favor:
> >> >> >
> http://mail.scipy.org/pipermail/scipy-dev/2009-November/013272.html
> >> >> >
> http://mail.scipy.org/pipermail/scipy-dev/2009-November/013308.html
> >> >> > We discussed the same issue on
> >> >> > https://github.com/scipy/scipy/pull/211
> >> >> > recently, and also the size of the binary.
> >> >> >
> >> >> > This is probably also the right moment to point out other recent
> >> >> > Cython
> >> >> > issues we've had:
> >> >> > 1. A memoryview issue with Python 2.4, either a Cython or Numpy
> bug:
> >> >> > https://github.com/numpy/numpy/pull/307
> >> >> > 2. We had to manually patch the generated C files when using Cython
> >> >> > 0.16, to
> >> >> > make them work with MinGW:
> >> >> > http://projects.scipy.org/scipy/ticket/1673
> >> >> > 3. According to Ray, there's also an indexing bug in Cython 0.16
> >> >> > which
> >> >> > requires to use 0.17-dev for
> https://github.com/scipy/scipy/pull/261
> >> >> >
> >> >> > I think it's clear that PR's like #261 above (Ray's ndimage.label
> >> >> > rewrite)
> >> >> > are in principle a good thing: faster and more general code which
> is
> >> >> > easier
> >> >> > to maintain. Now the question is what to do though. Here's some
> >> >> > options
> >> >> > that
> >> >> > I see:
> >> >> >
> >> >> > a) Keep things as is for now. Accept large file/binary sizes.
> >> >> > Manually
> >> >> > patch
> >> >> > the generated C if necessary.
> >> >> > b) Keep things as is for now. Either go back to Cython 0.15, or
> bump
> >> >> > required numpy version to latest dev version to not have to
> manually
> >> >> > patch
> >> >> > the generated C files.
> >> >> > c) Keep things as they are now, without accepting too large
> >> >> > file/binary
> >> >> > sizes. To be defined what too large. Means we can't get the full
> >> >> > benefits of
> >> >> > fused types for example.
> >> >> > d) Move to Cython as a build dependency. Write down the required
> >> >> > versions
> >> >> > and incompatibilities in the docs.
> >> >> > e) Include a Cython version in the scipy git repo, patch it to
> solve
> >> >> > the
> >> >> > above issues 2 and 3 (and any other ones that come along).
> >> >> > f) Some combination of the above.
> >> >> > g) Any other options?
> >> >>
> >> >> Am I right in thinking that Cython 0.17dev will generate usable C
> >> >> files without patching?
> >> >
> >> >
> >> > Yes.
> >>
> >> How about making Cython 0.17 a developer build-time dependency?
> >
> >
> > That's an option. Requiring a dev version will mean broken builds for
> some
> > of the users that don't read the docs well but simply do "easyinstall
> > cython". I'm not sure how acceptable that is.
>
> We could surely raise an informative error for that case?  I hope that
> there won't be long before the 0.17 release - but we should check with
> the Cython folks.
>

True. Perhaps it's not a big issue.

>
> On the plus side, lowering the barrier to rewriting in Cython seems
> like a really big win, especially with memoryviews and fused types
> available.
>

Agreed about lower barrier and fused types.

Memoryviews are still not OK, because of
https://github.com/numpy/numpy/pull/307.


> > That still leaves the (mostly orthogonal) question about binary size. I
> just
> > built Ray's PR, _nd_label.so is 1.4 Mb. For one function.
>
> Hmm.   1.4 Mb seems OK to me for the binary


Really? For one function? If we do that for each function, we end up with 4
Gb.


> - but I can see that we'd
> have to watch that.  Maybe it would be worth asking on the Cython list
> whether there is any way of reducing this, maybe by sharing across
> extensions.   How is the load time for that extension?
>

Very poor. (all hot cache):

 $ time python -c ""
real    0m0.039s
user    0m0.017s
sys    0m0.017s

$ time python -c "import numpy"
real    0m0.187s
user    0m0.080s
sys    0m0.100s

$ time python -c "import _ni_label"
real    0m0.206s
user    0m0.081s
sys    0m0.109s


> >> Meanwhile, making 'python setup.py sdist' dump the c files into the
> >> source distribution?  Maybe with a nightly development snapshot pushed
> >> up to sourceforge or similar?
> >>
> >> That's what we are doing for dipy (the c files into the sdist):
> >>
> >> https://github.com/nipy/dipy/blob/master/setup.py#L95
> >> https://github.com/nipy/dipy/blob/master/cythexts.py
> >>
> > That sounds like a good idea, if we go for a Cython build dependency.
> Even
> > if we don't, nightly builds would be great.
>
> Would a built sdist be widely used do you think?   Compared to someone
> following .git?   Just wondering out loud...
>

No idea to be honest.

Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/scipy-dev/attachments/20120705/0eaac053/attachment-0001.html 


More information about the SciPy-Dev mailing list