[Numpy-discussion] Proposed Roadmap Overview

Mark Wiebe mwwiebe@gmail....
Fri Feb 17 13:31:58 CST 2012


On Fri, Feb 17, 2012 at 11:00 AM, Christopher Jordan-Squire <cjordan1@uw.edu
> wrote:

> On Fri, Feb 17, 2012 at 10:21 AM, Mark Wiebe <mwwiebe@gmail.com> wrote:
> > On Fri, Feb 17, 2012 at 11:52 AM, Eric Firing <efiring@hawaii.edu>
> wrote:
> >>
> >> On 02/17/2012 05:39 AM, Charles R Harris wrote:
> >> >
> >> >
> >> > On Fri, Feb 17, 2012 at 8:01 AM, David Cournapeau <cournape@gmail.com
> >> > <mailto:cournape@gmail.com>> wrote:
> >> >
> >> >     Hi Travis,
> >> >
> >> >     On Thu, Feb 16, 2012 at 10:39 PM, Travis Oliphant
> >> >     <travis@continuum.io <mailto:travis@continuum.io>> wrote:
> >> >      > Mark Wiebe and I have been discussing off and on (as well as
> >> >     talking with Charles) a good way forward to balance two competing
> >> >     desires:
> >> >      >
> >> >      >        * addition of new features that are needed in NumPy
> >> >      >        * improving the code-base generally and moving towards a
> >> >     more maintainable NumPy
> >> >      >
> >> >      > I know there are load voices for just focusing on the second of
> >> >     these and avoiding the first until we have finished that.  I
> >> >     recognize the need to improve the code base, but I will also be
> >> >     pushing for improvements to the feature-set and user experience in
> >> >     the process.
> >> >      >
> >> >      > As a result, I am proposing a rough outline for releases over
> the
> >> >     next year:
> >> >      >
> >> >      >        * NumPy 1.7 to come out as soon as the serious bugs can
> be
> >> >     eliminated.  Bryan, Francesc, Mark, and I are able to help triage
> >> >     some of those.
> >> >      >
> >> >      >        * NumPy 1.8 to come out in July which will have as many
> >> >     ABI-compatible feature enhancements as we can add while improving
> >> >     test coverage and code cleanup.   I will post to this list more
> >> >     details of what we plan to address with it later.    Included for
> >> >     possible inclusion are:
> >> >      >        * resolving the NA/missing-data issues
> >> >      >        * finishing group-by
> >> >      >        * incorporating the start of label arrays
> >> >      >        * incorporating a meta-object
> >> >      >        * a few new dtypes (variable-length string,
> >> >     varialbe-length unicode and an enum type)
> >> >      >        * adding ufunc support for flexible dtypes and possibly
> >> >     structured arrays
> >> >      >        * allowing generalized ufuncs to work on more kinds of
> >> >     arrays besides just contiguous
> >> >      >        * improving the ability for NumPy to receive
> JIT-generated
> >> >     function pointers for ufuncs and other calculation opportunities
> >> >      >        * adding "filters" to Input and Output
> >> >      >        * simple computed fields for dtypes
> >> >      >        * accepting a Data-Type specification as a class or JSON
> >> > file
> >> >      >        * work towards improving the dtype-addition mechanism
> >> >      >        * re-factoring of code so that it can compile with a C++
> >> >     compiler and be minimally dependent on Python data-structures.
> >> >
> >> >     This is a pretty exciting list of features. What is the rationale
> >> > for
> >> >     code being compiled as C++ ? IMO, it will be difficult to do so
> >> >     without preventing useful C constructs, and without removing some
> of
> >> >     the existing features (like our use of C99 complex). The subset
> that
> >> >     is both C and C++ compatible is quite constraining.
> >> >
> >> >
> >> > I'm in favor of this myself, C++ would allow a lot code cleanup and
> make
> >> > it easier to provide an extensible base, I think it would be a natural
> >> > fit with numpy. Of course, some C++ projects become tangled messes of
> >> > inheritance, but I'd be very interested in seeing what a good C++
> >> > designer like Mark, intimately familiar with the numpy code base,
> could
> >> > do. This opportunity might not come by again anytime soon and I think
> we
> >> > should grab onto it. The initial step would be a release whose code
> that
> >> > would compile in both C/C++, which mostly comes down to removing C++
> >> > keywords like 'new'.
> >> >
> >> > I did suggest running it by you for build issues, so please raise any
> >> > you can think of. Note that MatPlotLib is in C++, so I don't think the
> >> > problems are insurmountable. And choosing a set of compilers to
> support
> >> > is something that will need to be done.
> >>
> >> It's true that matplotlib relies heavily on C++, both via the Agg
> >> library and in its own extension code.  Personally, I don't like this; I
> >> think it raises the barrier to contributing.  C++ is an order of
> >> magnitude more complicated than C--harder to read, and much harder to
> >> write, unless one is a true expert. In mpl it brings reliance on the CXX
> >> library, which Mike D. has had to help maintain.  And if it does
> >> increase compiler specificity, that's bad.
> >
> >
> > This gets to the recruitment issue, which is one of the most important
> > problems I see numpy facing. I personally have contributed a lot of code
> to
> > NumPy *in spite of* the fact it's in C. NumPy being in C instead of C++
> was
> > the biggest negative point when I considered whether it was worth
> > contributing to the project. I suspect there are many programmers out
> there
> > who are skilled in low-level, high-performance C++, who would be willing
> to
> > contribute, but don't want to code in C.
> >
> > I believe NumPy should be trying to find people who want to make high
> > performance, close to the metal, libraries. This is a very different
> type of
> > programmer than one who wants to program in Python, but is willing to
> dabble
> > in a lower level language to make something run faster. High performance
> > library development is one of the things the C++ developer community does
> > very well, and that community is where we have a good chance of finding
> the
> > programmers NumPy needs.
> >
> >> I would much rather see development in the direction of sticking with C
> >> where direct low-level control and speed are needed, and using cython to
> >> gain higher level language benefits where appropriate.  Of course, that
> >> brings in the danger of reliance on another complex tool, cython.  If
> >> that danger is considered excessive, then just stick with C.
> >
> >
> > There are many small benefits C++ can offer, even if numpy chooses only
> to
> > use a tiny subset of the C++ language. For example, RAII can be used to
> > reliably eliminate PyObject reference leaks.
> >
> > Consider a regression like this:
> > http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057831.html
> >
> > Fixing this in C would require switching all the relevant usages of
> > NPY_MAXARGS to use a dynamic memory allocation. This brings with it the
> > potential of easily introducing a memory leak, and is a lot of work to
> do.
> > In C++, this functionality could be placed inside a class, where the
> > deterministic construction/destruction semantics eliminate the risk of
> > memory leaks and make the code easier to read at the same time. There are
> > other examples like this where the C language has forced a suboptimal
> design
> > choice because of how hard it would be to do it better.
> >
> > Cheers,
> > Mark
> >
>
> In a similar vein, could incorporating C++ lead to a simpler low-level
> API for numpy?


This could definitely happen. One way to do it is to have a stable C API
which remains fixed over many releases, and a C++ library which is allowed
to change significantly at each release. This is what the LLVM project
does, for example. OpenCV is an example of another project which was
previously just C, but now has an extensive C++ API.


> I know Mark has talked before about--in the long-term,
> as a dream project to scratch his own itch, and something the BDF12
> doesn't necessarily agree with--implementing the great ideas in numpy
> as a layered C++ library. (Which would have the added benefit of
> making numpy more of a general array library that could be exposed to
> any language which can call C++ libraries.)
>
> I don't imagine that's on the table for anything near-term, but I
> wonder if making more of the low-level stuff C++ would make it easier
> for performance nuts to write their own code in C/C++ interfacing with
> numpy, and then expose it to python. After playing around with ufuncs
> at the C level for a little while last summer, I quickly realized any
> simplifications would be greatly appreciated.
>

This is all possible, yes. The way this typically works is that library
authors use advanced C++ techniques to get generality, performance, and
usability. The library user can then write code which is very simple and
written in a way which makes simple errors very difficult to make compared
to using a C-like API.

-Mark


> -Chris
>
>
> >>
> >> Eric
> >>
> >> >
> >> > Chuck
> >> _______________________________________________
> >> NumPy-Discussion mailing list
> >> NumPy-Discussion@scipy.org
> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> >
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20120217/720493a0/attachment-0001.html 


More information about the NumPy-Discussion mailing list