[SciPy-Dev] NumPy/SciPy participation in GSoC 2013
Charles R Harris
charlesr.harris@gmail....
Tue Apr 2 13:54:12 CDT 2013
On Tue, Apr 2, 2013 at 12:02 PM, Ralf Gommers <ralf.gommers@gmail.com>wrote:
>
>
>
> On Tue, Apr 2, 2013 at 5:24 PM, Nathaniel Smith <njs@pobox.com> wrote:
>
>> On Mon, Apr 1, 2013 at 12:58 PM, Ralf Gommers <ralf.gommers@gmail.com>
>> wrote:
>> > On Tue, Mar 26, 2013 at 12:27 AM, Ralf Gommers <ralf.gommers@gmail.com>
>> > wrote:
>> >>
>> >>
>> >>
>> >>
>> >> On Thu, Mar 21, 2013 at 10:20 PM, Ralf Gommers <ralf.gommers@gmail.com
>> >
>> >> wrote:
>> >>>
>> >>> Hi all,
>> >>>
>> >>> It is the time of the year for Google Summer of Code applications. If
>> we
>> >>> want to participate with Numpy and/or Scipy, we need two things:
>> enough
>> >>> mentors and ideas for projects. If we get those, we'll apply under
>> the PSF
>> >>> umbrella. They've outlined the timeline they're working by and
>> guidelines at
>> >>>
>> http://pyfound.blogspot.nl/2013/03/get-ready-for-google-summer-of-code.html
>> .
>> >>>
>> >>> We should be able to come up with some interesting project ideas I'd
>> >>> think, let's put those at
>> >>> http://projects.scipy.org/scipy/wiki/SummerofCodeIdeas. Preferably
>> with
>> >>> enough detail to be understandable for people new to the projects and
>> a
>> >>> proposed mentor.
>> >>>
>> >>> We need at least 3 people willing to mentor a student. Ideally we'd
>> have
>> >>> enough mentors this week, so we can apply to the PSF on time. If
>> you're
>> >>> willing to be a mentor, please send me the following: name, email
>> address,
>> >>> phone nr, and what you're interested in mentoring. If you have time
>> >>> constaints and have doubts about being able to be a primary mentor,
>> being a
>> >>> backup mentor would also be helpful.
>> >>
>> >>
>> >> So far we've only got one primary mentor (thanks Chuck!), most core
>> devs
>> >> do not seem to have the bandwidth this year. If there are other people
>> >> interested in mentoring please let me know. If not, then it looks like
>> we're
>> >> not participating this year.
>> >
>> >
>> > Hi all, an update on GSoC'13. We do have enough mentoring power after
>> all;
>> > NumPy/SciPy is now registered as a participating project on the PSF
>> page:
>> > http://wiki.python.org/moin/SummerOfCode/2013
>> >
>> > Prospective students: please have a look at
>> > http://wiki.python.org/moin/SummerOfCode/Expectations and at
>> > http://projects.scipy.org/scipy/wiki/SummerofCodeIdeas. In particular
>> note
>> > that we require you to make one pull request to NumPy/SciPy which has
>> to be
>> > merged *before* the application deadline (May 3). So please start
>> thinking
>> > about that, and start a discussion on your project idea on this list.>
>>
>> It doesn't look like I have the appropriate mojo to edit that page, but:
>>
>> - The NA thing at the bottom should just be deleted, dropping a
>> student into that would be cruel...
>>
>> - Some new entries, perhaps someone could add?
>>
>> ---------------
>>
>> == Performance parity between numpy arrays and Python scalars ==
>>
>> Small numpy arrays are very similar to Python scalars -- but numpy
>> incurs a fair amount of extra overhead for simple operations. For
>> large arrays this doesn't matter, but for code that manipulates a lot
>> of small pieces of data, it can be a serious bottleneck. For example:
>> {{{
>> In [1]: x = 1.0
>>
>> In [2]: numpy_x = np.asarray(x)
>>
>> In [3]: timeit x + x
>> 10000000 loops, best of 3: 61 ns per loop
>>
>> In [4]: timeit numpy_x + numpy_x
>> 1000000 loops, best of 3: 1.66 us per loop
>> }}}
>>
>> This project would involve profiling simple operations like the above,
>> determining where the bottlenecks are, and devising improved
>> algorithms to solve them, with the goal of getting the numpy time as
>> close as possible to the Python time. Not only would this make all
>> numpy-using code faster, but it would pave the way for future
>> simplifications in numpy's core, which currently has a lot of
>> duplicate code that attempts to work around these slow paths instead
>> of fixing them properly.
>>
>> Some possible concrete changes:
>> 1. numpy's "ufunc loop lookup code" (which is used to determine, e.g.,
>> whether to use the integer or floating-point versions of "+") is slow
>> and inefficient.
>> 2. Checking for floating point errors is very slow; we can and should
>> do it less often.
>> 3. When allocating the return value, the "+" for Python floats calls
>> malloc() only once; numpy calls it twice (once for the array object
>> itself, and a second time for the array data). Stashing both objects
>> within a single allocation would be more efficient.
>> 4. ...see what profiling says! We know 61 ns is possible.
>>
>> == Pythonic dtypes ==
>>
>> A numpy "dtype" is an object that knows how to work with different
>> sorts of values, represented as fixed-length packed binary values. For
>> example, the int32 dtype knows how to convert the Python object '-1'
>> to the four-byte buffer 0xff 0xff 0xff 0xff.
>>
>> Conceptually, dtype objects are arranged into a nice type hierarchy:
>> http://docs.scipy.org/doc/numpy/_images/dtype-hierarchy.png
>>
>> But implementation-wise, dtypes don't use the Python class system at
>> all. There's just a single Python class (numpy.dtype), and all dtypes
>> are instances of it. (This is because when numpy was first designed,
>> they only expected there to be maybe 20 dtype objects total.) This
>> turns out to cause a number of problems -- you can't define new dtypes
>> from Python, only from C; you can't use isinstance to compare dtypes
>> (you have to use a hacky numpy-specific API instead); different dtypes
>> can't easily contain state (instead, the single dtype class has
>> gradually sprouted new fields as new dtypes turned out to need them);
>> etc. Basically we've been reinventing the Python class system, poorly.
>>
>> The goal for this project is to turn dtype classes into regular Python
>> classes with a proper type hierarchy and using the standard Python
>> mechanisms.
>>
>> Longer term goals (at least the first of which is probably achievable
>> within the SoC timeline):
>> 1. Allow for defining new dtypes using pure Python.
>> 2. There are a bunch of special cases in the ufunc code for handling
>> strings and record arrays; we should make the appropriate extensions
>> to the dtype API so that they can become regular dtypes.
>> 3. A proper categorical data dtype. (This is trivial once the above is
>> done.)
>> 4. NA dtypes
>>
>> ------------------------------
>>
>
>
> Added those. Maybe one of the Trac admins can fix your edit rights.
>
I'm having problems logging in also. I have an account in my name, but the
password may have been changed along the line and apparently my email
account also.
<snip>
Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/scipy-dev/attachments/20130402/aaba4fa8/attachment-0001.html
More information about the SciPy-Dev
mailing list