[Numpy-discussion] Proposed Roadmap Overview
Sat Feb 18 15:57:24 CST 2012
> * NumPy 1.8 to come out in July which will have as many ABI-compatible feature enhancements as we can add while improving test coverage and code cleanup. I will post to this list more details of what we plan to address with it later. Included for possible inclusion are:
> * resolving the NA/missing-data issues
> * finishing group-by
> * incorporating the start of label arrays
> * incorporating a meta-object
> * a few new dtypes (variable-length string, varialbe-length unicode and an enum type)
> * adding ufunc support for flexible dtypes and possibly structured arrays
> * allowing generalized ufuncs to work on more kinds of arrays besides just contiguous
> * improving the ability for NumPy to receive JIT-generated function pointers for ufuncs and other calculation opportunities
> * adding "filters" to Input and Output
> * simple computed fields for dtypes
> * accepting a Data-Type specification as a class or JSON file
> * work towards improving the dtype-addition mechanism
> For some of these things it's not entirely (or at all, what's a meta-object?) clear to me what they mean or how they would work. How do you plan to go about working on these features? One NEP per feature?
I thought I responded to this already, but it might have been from a different mail server.... Yes, these will each be discussed in course as they are developed. I just wanted to get an outline started. More detail will come out on each feature as development proceeds.
There is a larger list of features that we will be suggesting and discussing in the months ahead as NumPy 2.0 development is proposed and discussed. But, this list includes things that are fairly straightforward to implement in the current data-model and calculation infrastructure.
There is a lot of criticism of the C-code which is welcome. I wrote *a lot* of that code --- inspired by and following patterns laid out by other people. I am always interested in specific improvement ideas and/or proposals, as are most people. I especially appreciate targeted, constructive comments and not just general FUD. There has been some criticism of the C-API documentation. After I gave away the content of my book, Guide to NumPy, 3 years ago: Joe Harrington and others adapted it to the web. The C-API portion which was documented in my book (see starting with page 211 at http://www.tramy.us/numpybook.pdf). This material is now available online as well (where it has received updates and improvements): http://docs.scipy.org/doc/numpy/reference/c-api.array.html
There are under-documented sections of the code --- usually these are in areas where adoption has driven demand for an understanding of those features (adding new dtypes and array scalars, for example). In addition, there are always improvements to be made to the way something is said and described (and there are different ways people like to be taught).
The C/C++ discussion is just getting started. Everyone should keep in mind that this is not something that is going to happening quickly. This will be a point of discussion throughout the year. I'm not a huge supporter of C++, but C++11 does look like it's made some nice progress, and as I think about making a core-set of NumPy into a library that can be called by multiple languages (and even multiple implementations of Python), tempered C++ seems like it might be an appropriate way to go.
Cython could be useful for Python interfaces to that Core and for extension modules on top, but Cython is *not* a solution for the core of NumPy. It was entertained as we did the IronPython work, but realized it would have taken too long. I'm actually quite glad that we didn't go that direction, now. Cython is a nice project, and I think will play a role in the stack that emerges, but I am more interested in an eventual NumPy core that does not rely on the Python C-API.
Another thing that I would like to see happen for NumPy 1.8 is the use of bento by default for the build --- and encouraging down-stream projects to use it as well. We should deprecate as much of numpy.distutils as possilbe, in my mind. What happens during build is pretty hard to understand partly because distutils never really supported building complex extension modules --- that community is still pretty hostile to the needs of extension writers with a real build problem on their hands. We have gotten by with numpy.distutils, but it has not been the easiest thing to adapt.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion