[Numpy-discussion] My GSoC Proposal to Implement a Subset of NumPy for PyPy
Stéfan van der Walt
stefan@sun.ac...
Sat Apr 17 02:24:40 CDT 2010
Hi Dan
On 17 April 2010 06:50, Dan Roberts <ademan555@gmail.com> wrote:
> Hi everybody, my name is Dan Roberts, and my Google Summer of Code
> proposal was categorized under NumPy rather than PyPy, so it will end up
> being reviewed by mentors for the NumPy project. I'd like to take this
> chance to introduce myself and my proposal.
Thanks for the introduction, and welcome to NumPy!
> I hadn't prepared for review by the NumPy mentors, but this can make my
> proposal stronger than before. With a bit of help from all of you, I can
> dedicate my summer to creating more useful code than I would have
> previously. I realize that from the perspective of NumPy, my proposal might
> seem lacking, so I'd like to also invite the scrutiny of all of the readers
> of this list.
This proposal builds a bridge between two projects, so even if it
technically falls under the NumPy banner, we'll lean heavily on Maciej
Fijalkowski from PyPy for guidance.
> Why should we bother reimplimenting anything? PyPy, for those who are
> unfamiliar, has the ability to Just-in-Time compile itself and programs that
> it's running. One of the major advantages of this is that code operating on
> NumPy arrays could potentially be written in pure-python, with normal
> looping constructs, and be nearly as fast as a ufunc painstakingly crafted
> in C. I'd love to see as much Python and as little C as possible, and I'm
> sure I'm not alone in that wish.
Your code has a fairly specialised application and it's worth
discussing exactly where it would fit in. For example, from our
perspective rewriting things such as zeros(), ones(), etc. is not of
much interest. However, the ability to whip up fast ufuncs and
generalised ufuncs is in great demand. Also, it is sometimes clearer
to express an algorithm as
for i in range(n):
for j in range(m):
x[i, j] = some_op(x[i, j])
instead of vectorising the code. Here, PyPy can provide a big speed
improvement. I'm not sure, but it sounds like the "interface" you
refer to would be things such as the [] operator on arrays, for
example?
Just an an aside, I think PyPy would be perfect for managing sparse
matrices (such as scipy.sparse), where there are so many loops
involved---in fact, an RPy implementation of scipy.sparse could be an
interesting proposal for a next SoC!.
I spoke briefly with Maciej the other day, and I realised that there
is a lot of detail on how PyPy interacts with C modules that we are
not aware of. It would be great if you could elaborate a bit on the
way PyPy is able to access current C functionality. For example, can
you use NumPy as is, and just replace functionality piece by piece, or
would you need to rewrite a large part of the interface at a time?
Regards
Stéfan
More information about the NumPy-Discussion
mailing list