[Numpy-discussion] Missing data again

Charles R Harris charlesr.harris@gmail....
Sat Mar 3 15:55:04 CST 2012


On Sat, Mar 3, 2012 at 1:30 PM, Travis Oliphant <travis@continuum.io> wrote:

> Hi all,
>
> I've been thinking a lot about the masked array implementation lately.
> I finally had the time to look hard at what has been done and now am of the
> opinion that I do not think that 1.7 can be released with the current state
> of the masked array implementation *unless* it is clearly marked as
> experimental and may be changed in 1.8
>
>
That was the intention.


> I wish I had been able to be a bigger part of this conversation last year.
>   But, that is why I took the steps I took to try and figure out another
> way to feed my family *and* stay involved in the NumPy community.   I would
> love to stay involved in what is happening in the SciPy community, but I am
> more satisfied with what Ralf, Warren, Robert, Pauli, Josef, Charles,
> Stefan, and others are doing there right now, and don't have time to keep
> up with everything.    Even though SciPy was the heart and soul of why I
> even got involved with Python for open source in the first place and took
> many years of my volunteer labor, I won't be able to spend significant time
> on SciPy code over the coming months.   At some point, I really hope to be
> able to make contributions again to that code-base.   Time will tell
> whether or not my aspirations will be realized.  It depends quite a bit on
> whether or not my kids have what they need from me (which right now is
> money and time).
>
> NumPy, on the other hand, is not in a position where I can feel
> comfortable leaving my "baby" to others.  I recognize and value the
> contributions from many people to make NumPy what it is today (e.g. code
> contributions, code rearrangement and standardization, build and install
> improvement, and most recently some architectural changes).    But, I feel
> a personal responsibility for the code base as I spent a great many months
> writing NumPy in the first place, and I've spent a great deal of time
> interacting with NumPy users and feel like I have at least some sense of
> their stories.    Of course, I built on the shoulders of giants, and much
> of what is there is *because of* where the code was adapted from (it was
> not created de-novo).   Currently,  there remains much that needs to be
> communicated, improved, and worked on, and I have specific opinions about
> what some changes and improvements should be, how they should be written,
> and how the resulting users need to be benefited.
>  It will take time to discuss all of this, and that's where I will spend
> my open-source time in the coming months.
>
> In that vein:
>
> Because it is slated to go into release 1.7, we need to re-visit the
> masked array discussion again.    The NEP process is the appropriate one
> and I'm glad we are taking that route for these discussions.   My goal is
> to get consensus in order for code to get into NumPy (regardless of who
> writes the code).    It may be that we don't come to a consensus
> (reasonable and intelligent people can disagree on things --- look at the
> coming election...).   We can represent different parts of what is
> fortunately a very large user-base of NumPy users.
>
> First of all, I want to be clear that I think there is much great work
> that has been done in the current missing data code.  There are some nice
> features in the where clause of the ufunc and the machinery for the
> iterator that allows re-using ufunc loops that are not re-written to check
> for missing data.   I'm sure there are other things as well that I'm not
> quite aware of yet.    However, I don't think the API presented to the
> numpy user presently is the correct one for NumPy 1.X.
>

A few particulars:
>
>        * the reduction operations need to default to "skipna" --- this is
> the most common use case which has been re-inforced again to me today by a
> new user to Python who is using masked arrays presently
>
>        * the mask needs to be visible to the user if they use that
> approach to missing data (people should be able to get a hold of the mask
> and work with it in Python)
>
>        * bit-pattern approaches to missing data (at least for float64 and
> int32) need to be implemented.
>
>        * there should be some way when using "masks" (even if it's hidden
> from most users) for missing data to separate the low-level ufunc operation
> from the operation
>           on the masks...
>
>
Mind, Mark only had a few weeks to write code. I think the unfinished state
is a direct function of that.


> I have heard from several users that they will *not use the missing data*
> in NumPy as currently implemented, and I can now see why.    For better or
> for worse, my approach to software is generally very user-driven and very
> pragmatic.  On the other hand, I'm also a mathematician and appreciate the
> cognitive compression that can come out of well-formed structure.
>  None-the-less, I'm an *applied* mathematician and am ultimately motivated
> by applications.
>
>
I think that would be Wes. I thought the current state wasn't that far away
from what he wanted in the only post where he was somewhat explicit. I
think it would be useful for him to sit down with Mark at some time and
thrash things out since I think there is some misunderstanding involved.


> I will get a hold of the NEP and spend some time with it to discuss some
> of this in that document.   This will take several weeks (as PyCon is next
> week and I have a tutorial I'm giving there).    For now, I do not think
> 1.7 can be released unless the masked array is labeled *experimental*.
>
>
Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20120303/345f46aa/attachment-0001.html 


More information about the NumPy-Discussion mailing list