[Numpy-discussion] Missing data again
Charles R Harris
Sat Mar 3 15:55:04 CST 2012
On Sat, Mar 3, 2012 at 1:30 PM, Travis Oliphant <firstname.lastname@example.org> wrote:
> Hi all,
> I've been thinking a lot about the masked array implementation lately.
> I finally had the time to look hard at what has been done and now am of the
> opinion that I do not think that 1.7 can be released with the current state
> of the masked array implementation *unless* it is clearly marked as
> experimental and may be changed in 1.8
That was the intention.
> I wish I had been able to be a bigger part of this conversation last year.
> But, that is why I took the steps I took to try and figure out another
> way to feed my family *and* stay involved in the NumPy community. I would
> love to stay involved in what is happening in the SciPy community, but I am
> more satisfied with what Ralf, Warren, Robert, Pauli, Josef, Charles,
> Stefan, and others are doing there right now, and don't have time to keep
> up with everything. Even though SciPy was the heart and soul of why I
> even got involved with Python for open source in the first place and took
> many years of my volunteer labor, I won't be able to spend significant time
> on SciPy code over the coming months. At some point, I really hope to be
> able to make contributions again to that code-base. Time will tell
> whether or not my aspirations will be realized. It depends quite a bit on
> whether or not my kids have what they need from me (which right now is
> money and time).
> NumPy, on the other hand, is not in a position where I can feel
> comfortable leaving my "baby" to others. I recognize and value the
> contributions from many people to make NumPy what it is today (e.g. code
> contributions, code rearrangement and standardization, build and install
> improvement, and most recently some architectural changes). But, I feel
> a personal responsibility for the code base as I spent a great many months
> writing NumPy in the first place, and I've spent a great deal of time
> interacting with NumPy users and feel like I have at least some sense of
> their stories. Of course, I built on the shoulders of giants, and much
> of what is there is *because of* where the code was adapted from (it was
> not created de-novo). Currently, there remains much that needs to be
> communicated, improved, and worked on, and I have specific opinions about
> what some changes and improvements should be, how they should be written,
> and how the resulting users need to be benefited.
> It will take time to discuss all of this, and that's where I will spend
> my open-source time in the coming months.
> In that vein:
> Because it is slated to go into release 1.7, we need to re-visit the
> masked array discussion again. The NEP process is the appropriate one
> and I'm glad we are taking that route for these discussions. My goal is
> to get consensus in order for code to get into NumPy (regardless of who
> writes the code). It may be that we don't come to a consensus
> (reasonable and intelligent people can disagree on things --- look at the
> coming election...). We can represent different parts of what is
> fortunately a very large user-base of NumPy users.
> First of all, I want to be clear that I think there is much great work
> that has been done in the current missing data code. There are some nice
> features in the where clause of the ufunc and the machinery for the
> iterator that allows re-using ufunc loops that are not re-written to check
> for missing data. I'm sure there are other things as well that I'm not
> quite aware of yet. However, I don't think the API presented to the
> numpy user presently is the correct one for NumPy 1.X.
A few particulars:
> * the reduction operations need to default to "skipna" --- this is
> the most common use case which has been re-inforced again to me today by a
> new user to Python who is using masked arrays presently
> * the mask needs to be visible to the user if they use that
> approach to missing data (people should be able to get a hold of the mask
> and work with it in Python)
> * bit-pattern approaches to missing data (at least for float64 and
> int32) need to be implemented.
> * there should be some way when using "masks" (even if it's hidden
> from most users) for missing data to separate the low-level ufunc operation
> from the operation
> on the masks...
Mind, Mark only had a few weeks to write code. I think the unfinished state
is a direct function of that.
> I have heard from several users that they will *not use the missing data*
> in NumPy as currently implemented, and I can now see why. For better or
> for worse, my approach to software is generally very user-driven and very
> pragmatic. On the other hand, I'm also a mathematician and appreciate the
> cognitive compression that can come out of well-formed structure.
> None-the-less, I'm an *applied* mathematician and am ultimately motivated
> by applications.
I think that would be Wes. I thought the current state wasn't that far away
from what he wanted in the only post where he was somewhat explicit. I
think it would be useful for him to sit down with Mark at some time and
thrash things out since I think there is some misunderstanding involved.
> I will get a hold of the NEP and spend some time with it to discuss some
> of this in that document. This will take several weeks (as PyCon is next
> week and I have a tutorial I'm giving there). For now, I do not think
> 1.7 can be released unless the masked array is labeled *experimental*.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion