[Numpy-discussion] DARPA funding for Blaze and passing the NumPy torch
Charles R Harris
Mon Dec 17 12:50:44 CST 2012
On Sun, Dec 16, 2012 at 11:07 PM, Travis Oliphant <email@example.com>wrote:
> Hello all,
> There is a lot happening in my life right now and I am spread quite thin
> among the various projects that I take an interest in. In particular, I
> am thrilled to publicly announce on this list that Continuum Analytics has
> received DARPA funding (to the tune of at least $3 million) for Blaze,
> Numba, and Bokeh which we are writing to take NumPy, SciPy, and
> visualization into the domain of very large data sets. This is part of
> the XDATA program, and I will be taking an active role in it. You can
> read more about Blaze here: http://blaze.pydata.org. You can read more
> about XDATA here: http://www.darpa.mil/Our_Work/I2O/Programs/XDATA.aspx
> I personally think Blaze is the future of array-oriented computing in
> Python. I will be putting efforts and resources next year behind making
> that case. How it interacts with future incarnations of NumPy, Pandas, or
> other projects is an interesting and open question. I have no doubt the
> future will be a rich ecosystem of interoperating array-oriented
> data-structures. I invite anyone interested in Blaze to participate in
> the discussions and development at
> https://groups.google.com/a/continuum.io/forum/#!forum/blaze-dev or watch
> the project on our public GitHub repo:
> https://github.com/ContinuumIO/blaze. Blaze is being incubated under the
> ContinuumIO GitHub project for now, but eventually I hope it will receive
> its own GitHub project page later next year. Development of Blaze is
> early but we are moving rapidly with it (and have deliverable deadlines ---
> thus while we will welcome input and pull requests we won't have a ton of
> time to respond to simple queries until
> at least May or June). There is more that we are working on behind
> the scenes with respect to Blaze that will be coming out next year as well
> but isn't quite ready to show yet.
> As I look at the coming months and years, my time for direct involvement
> in NumPy development is therefore only going to get smaller. As a result
> it is not appropriate that I remain as "head steward" of the NumPy project
> (a term I prefer to BFD12 or anything else). I'm sure that it is apparent
> that while I've tried to help personally where I can this year on the NumPy
> project, my role has been more one of coordination, seeking funding, and
> providing expert advice on certain sections of code. I fundamentally
> agree with Fernando Perez that the responsibility of care-taking open
> source projects is one of stewardship --- something akin to public service.
> I have tried to emulate that belief this year --- even while not always
> It is time for me to make official what is already becoming apparent to
> observers of this community, namely, that I am stepping down as someone who
> might be considered "head steward" for the NumPy project and officially
> leaving the development of the project in the hands of others in the
> community. I don't think the project actually needs a new "head steward"
> --- especially from a development perspective. Instead I see a lot of
> strong developers offering key opinions for the project as well as a great
> set of new developers offering pull requests.
> My strong suggestion is that development discussions of the project
> continue on this list with consensus among the active participants being
> the goal for development. I don't think 100% consensus is a rigid
> requirement --- but certainly a super-majority should be the goal, and
> serious changes should not be made with out a clear consensus. I would
> pay special attention to under-represented people (users with intense usage
> of NumPy but small voices on this list). There are many of them. If
> you push me for specifics then at this point in NumPy's history, I would
> say that if Chuck, Nathaniel, and Ralf agree on a course of action, it will
> likely be a good thing for the project. I suspect that even if only 2 of
> the 3 agree at one time it might still be a good thing (but I would expect
> more detail and discussion). There are others whose opinion should be
> sought as well: Ondrej Certik, Perry Greenfield, Robert Kern, David
> Cournapeau, Francesc Alted, and Mark Wiebe to
> name a few. For some questions, I might even seek input from people
> like Konrad Hinsen and Paul Dubois --- if they have time to give it. I
> will still be willing to offer my view from time to time and if I am asked.
> Greg Wilson (of Software Carpentry fame) asked me recently what letter I
> would have written to myself 5 years ago. What would I tell myself to do
> given the knowledge I have now? I've thought about that for a bit, and
> I have some answers. I don't know if these will help anyone, but I offer
> them as hopefully instructive:
> 1) Do not promise to not break the ABI of NumPy --- and in fact
> emphasize that it will be broken at least once in the 1.X series. NumPy
> was designed to add new data-types --- but not without breaking the ABI.
> NumPy has needed more data-types and still needs even more. While it's
> not beautifully simple to add new data-types, it can be done. But, it is
> impossible to add them without breaking the ABI in some fashion. The
> desire to add new data-types *and* keep ABI compatibility has led to
> significant pain. I think the ABI non-breakage goal has been amplified by
> the poor state of package management in Python. The fact that it's
> painful for someone to update their downstream packages when an upstream
> ABI breaks (on Windows and Mac in particular) has put a lot of unfortunate
> pressure on this community. Pressure that was not envisioned or
> understood when I was writing NumPy.
> (As an aside: This is one reason Continuum has invested resources in
> building the conda tool and a completely free set of binary packages called
> Anaconda CE which is becoming more and more usable thanks to the efforts of
> Bryan Van de Ven and Ilan Schnell and our testing team at Continuum. The
> conda tool: http://docs.continuum.io/conda/index.html is open source and
> BSD licensed and the next release will provide the ability to build
> packages, build indexes on package repositories and interface with pip.
> Expect a blog-post in the near future about how cool conda is!).
> 2) Don't create array-scalars. Instead, make the data-type object
> a meta-type object whose instances are the items returned from NumPy
> arrays. There is no need for a separate array-scalar object and in fact
> it's confusing to the type-system. I understand that now. I did not
> understand that 5 years ago.
> 3) Special-case small arrays to avoid the memory indirection and
> look at PDL so that generalized ufuncs are supported from the beginning.
> 4) Define missing-value data-types and labels on the dimensions
> and arrays
> 5) Define a standard "dictionary of NumPy arrays" interface as the
> basic "structure of arrays" concept to go with the "array of structures"
> that structured arrays provide.
> 6) Start work on SQL interface to NumPy arrays *now*
> Additional comments I would make to someone today:
> 1) Most of NumPy should be written in Python with Numba used as
> the compiler (particularly as soon as Numba gets the ability to create
> Python extension modules which is in the next release).
> 2) There are still many, many optimizations that can be made in
> NumPy run-time (especially in the face of modern hardware).
> I will continue to be available to answer questions and I may chime in
> here and there on pull requests. However, most of my time for NumPy will
> be on administrative aspects of the project where I will continue to take
> an active interest. To help make sure that this happens in a transparent
> way, I would like to propose that "administrative" support of the project
> be left to the NumFOCUS board of which I am currently 1 of 9 members. The
> other board members are currently: Ralf Gommers, Anthony Scopatz, Andy
> Terrel, Prabhu Ramachandran, Fernando Perez, Emmanuelle Gouillart, Jarrod
> Millman, and Perry Greenfield. While NumFOCUS basically seeks to
> promote and fund the entire scientific Python stack, I think it can also
> play a role in helping to administer some of the core projects which the
> board members themselves have a personal interest in.
> By administrative support, I mean decisions like "what should be done with
> any NumPy IP or web-domains" or "what kind of commercially-related ads or
> otherwise should go on the NumPy home page", or "what should be done with
> the NumPy github account", etc. --- basically anything that requires an
> executive decision that is not directly development related. I don't
> expect there to be many of these decisions. But, when they show up, I
> would like them to be made in as transparent and public of a way as
> possible. In practice, the way I see this working is that there are
> members of the NumPy community who are (like me) particularly interested in
> admin-related questions and serve on a NumPy team in the NumFOCUS
> organization. I just know I'll be attending NumFOCUS board meetings,
> and I would like to help move administrative decisions forward with NumPy
> as part of the time I spend thinking about NumFOCUS.
> If people on this list would like to play an active role in those admin
> discussions, then I would heartily welcome them into NumFOCUS membership
> where they would work with interested members of the NumFOCUS board (like
> me and Ralf) to help direct that organization. I would really love to
> have someone from this list volunteer to serve on the NumPy team as part of
> the NumFOCUS project. I am certainly going to be interested in the
> opinions of people who are active participants on this list and on GitHub
> pages for NumPy on anything admin related to NumPy, and I expect Ralf would
> also be very interested in those views.
> One admin discussion that I will bring up in another email (as this one is
> already too long) is about making 2 or 3 lists for NumPy such as
> firstname.lastname@example.org, email@example.com, and numpy-users@numpy-org.
> Just because I'll be spending more time on Blaze, Numba, Bokeh, and the
> PyData ecosystem does not mean that I won't be around for NumPy. I will
> continue to promote NumPy. My involvement with Continuum connects me to
> NumPy as Continuum continues to offer commercial support contracts for
> NumPy (and SciPy and other open source projects). Continuum will also
> continue to maintain its Github NumPy project which will contain pull
> requests from our company that we are working to get into the mainline
> branch. Continuum will also continue to provide resources for
> release-management of NumPy (we have been funding Ondrej in this role for
> the past 6 months --- though I would like to see this happen through
> NumFOCUS in the future even if Continuum provides much of the money). We
> also offer optimized versions of NumPy in our commercial Anaconda
> distribution (Anaconda CE is free and open source).
> Also, I will still be available for questions and help (I'm not
> disappearing --- just making it clear that I'm stepping back into an
> occasional NumPy developer role). It has been extremely gratifying to see
> the number of pull-requests, GitHub-conversations, and code contributions
> increase this year. Even though the 1.7 release has taken a long time to
> stabilize, there have been a lot of people participating in the discussion
> and in helping to track down the problems, figure out what to do, and fix
> them. It even makes it possible for people to think about 1.7 as a
> long-term release.
> I will continue to hope that the spirit of openness, tolerance, respect,
> and gratitude continue to permeate this mailing list, and that we continue
> to seek to resolve any differences with trust and mutual respect. I know
> I have offended people in the past with quick remarks and actions made
> sometimes in haste without fully realizing how they might be taken. But,
> I also know that like many of you I have always done the very best I could
> for moving Python for scientific computing forward in the best way I know
> Thank you for the great memories. If you will forgive a little
> sentiment: My daughter who is in college now was 3 years old when I began
> working with this community and went down a road that would lead to my
> involvement with SciPy and NumPy. I have marked the building of my family
> and the passage of time with where the Python for Scientific Computing
> Community was at. Like many of you, I have given a great deal of
> attention and time to building this community. That sacrifice and time
> has led me to love what we have created. I know that I leave this
> segment of the community with the tools in better hands than mine. I am
> hopeful that NumPy will continue to be a useful array library for the
> Python community for many years to come even as we all continue to build
> new tools for the future.
Congratulations on the DARPA grant and best wishes for the success of your
enterprises. We will all do our best to keep Numpy moving forward and hope
that Blaze will contribute to that.
One administrative detail you might want to deal with at the point is
ownership of the Numpy github repositories. I note that the Scipy
repositories have a number of owners, but you are currently the sole owner
of the Numpy site. May I suggest adding a few more owners? I'd recommend
Ralf, Pauli, Nathaniel, and myself as additions.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion