[Numpy-discussion] Numpy/Cython Google Summer of Code project idea
Fernando Perez
fperez.net@gmail....
Fri Mar 7 03:36:40 CST 2008
On Fri, Mar 7, 2008 at 1:17 AM, Konrad Hinsen <konrad.hinsen@laposte.net> wrote:
> On 07.03.2008, at 09:59, Fernando Perez wrote:
>
> > I doubt it's much better, and that's part of the point of the project:
> > to identify the problems and fix them once and for all. Getting
> > anything fixed in pyrex was hard due to a very opaque development
> > process, but Cython is part of the Sage umbrella and thus enjoys a
> > very open and active development community. Furthermore, they are
> > explicitly interested in improving the Cython numpy support, and are
> > willing to help along if this project goes forward.
>
> This is very good news in my opinion. Pyrex and Cython are already
> very useful tools for scientific computing. They lower the barrier to
> writing extension modules significantly (compared to writing directly
> in C), and they permit a continuous transition from a working Python
> prototype to an efficient extension module. I have been writing all
> my recent extension modules using Pyrex, and I definitely won't go
> back to C. If Cython gets explicit array support, it would become an
> even more useful tool for the NumPy community.
Thanks for your feedback and support of the idea, Konrad.
I just realized that I forgot to include this message that W. Stein
(sage lead) sent me, which I think presents many of these points very
nicely and may be useful in this discussion.
cheers
f
---------- Forwarded message ----------
From: Dag Sverre Seljebotn <dagss@student.matnat.uio.no>
Date: Tue, Mar 4, 2008 at 2:54 PM
Subject: [Cython] Thoughts on numerical computing/NumPy support
To: cython-dev@codespeak.net
Since Robert mentioned NumPy in relation with adding operator support I
thought about sharing my more thoughts about NumPy - I'm very new to
Cython so I guess take it for what it is worth - however what I've seen
so far looks so promising for me that I might want to spend some time in
a few months working on implementing some of this, which perhaps may
make my thoughts more intereseting :-)
Currently, Cython is mostly geared towards wrapping C code, but it is
also an excellent foundation for being a numerical tool - but the rough
edges are still prohibitive. A few relatively small steps (in terms of
man-hours needed) would improve the situation a lot I think - not
perfect, but perhaps in a few years we can have something that will
finally kill FORTRAN :-)
Three suggestions comes briefly here, if anyone's interested and it is
not already discussed and decided I might flesh them out in "PEP-style"
in the coming month?
Note that a) is what is important for me, b) and c) is just something I
throw along...
a) numpy.ndarray syntax candy. Really, what one should implement is
syntax support for PEP-3118:
http://www.python.org/dev/peps/pep-3118/
Because this protocol will be shared between NumPy, PIL etc. in Python 3
it could make sense to simply have "native"/hard-coded support for this
aspect without necesarrily making it a generic operator feature, and one
can then use the same approach as will be needed for buffers in Python 3
for NumPy in Python 2?
Example (where "array" is considered a new, Cython-native type that will
have automatic conversion from any NumPy arrays and Python 3 buffers):
def myfunc(array<2, unsigned char> arr):
arr[4, 5] = 10
might be translated to the equivalent of the currently legal:
def myfunc(numpy.ndarray arr):
if arr.nd != 2 or arr.dtype != numpy.dtype(numpy.uint8):
raise ValueError("Must pass 2-dimensional uint8 array.")
cdef unsigned char* arr_buf = <unsigned char*>arr.data
arr.data[4 * arr.strides[0] + 5 * arr.strides[1]] = 10
(Probably caching the strides in local variables etc.). That should do
as a first implementation -- it is always possible to be more
sophisticated, but this little will allow NumPyers to simply dive in.
Specifically, the number of dimensions must be declared first and only
direct access in that many dimensions are allowed. Slices etc. should be
less important (they can be done on the Python object instead).
Moving on from here, one should probably instead define bufferinfo from
PEP-3118 and make it say
def myfunc(bufferinfo arr):
if arr.ndim != 2 or arr.format != "B") or arr.readonly:
raise ValueError("Must pass writeable 2-dimensional buffer with
format 'B'.")
...
with automatic conversion from NumPy arrays to bufferinfo.
b) Allow numpy types? Basically, make it possible to say "cdef uint8
myvar", at least for in-function-variables that is not interfacing with
C code, so that for numerical use one doesn't need to learn C. This can
be in addition, so it should not break existing code, though I can
understand resentment against the idea as well.
c) Probably controversial: More Pythonic syntax. A syntax for decoration
of function arguments is decided upon (at least in Python 3), so to
align with that one could allow for stuff like
@Compile
def myfunc(a: uint8, b: array(2, uint8), c: int = 10):
d: ptr(int) = &a
print a, b, c, d
Which is "almost" Python - only the definition of d is different, but
consistency talks for change there as well. This can also be in addition
to the existing syntax so it should not break anything (allowing, say,
only one type of syntax per function).
But a) is what is interesting here...
--
Dag Sverre
More information about the Numpy-discussion
mailing list