[Numpy-discussion] feedback request: proposal to add masks to the core ndarray
Sat Jun 25 14:56:44 CDT 2011
On Sat, Jun 25, 2011 at 6:00 AM, Matthew Brett <email@example.com>wrote:
> On Sat, Jun 25, 2011 at 1:54 AM, Mark Wiebe <firstname.lastname@example.org> wrote:
> > On Fri, Jun 24, 2011 at 5:21 PM, Matthew Brett <email@example.com>
> >> @Mark - I don't have a clear idea whether you consider the nafloat64
> >> option to be still in play as the first thing to be implemented
> >> (before array.mask). If it is, what kind of thing would persuade you
> >> either way?
> > I'm focusing all of my effort on getting my proposal of adding a mask to
> > core ndarray into a state where it satisfies everyone's requirements as
> > I can.
> Maybe it would be worth setting out the requirements formally somewhere?
The design that's forming is a combination of:
* Solve the missing data problem
* My ideas of what a good solution looks like:
* applies to all NumPy dtypes in a fully general way
* high-performance, low overhead where possible
* makes the C-level implementation of NumPy nicer to work with, not
* easy to use from Python for unskilled programmers
* easy to use more powerful functionality from Python for skilled
* satisfies all or most of the needs of the many users of arrays with a
"missing data" aspect to them
* All the feedback I'm getting from discussions on the list
That's not a formal requirements specification, but might shed some insight.
> I'm not precluding the possibility that someone could convince me
> > that the na-dtype is good, but I gave it a good chunk of thought before
> > starting to write the proposal. To persuade me towards the na-dtype
> > I need to be convinced that I'm solving the problem class in a generic
> > that works orthogonally with other features, with manageable
> > requirements, a very usable result for both strong and weak programmers,
> > with good performance characteristics. I think the na-dtype approach
> > as generic as I would like, and the implementation seems like it would be
> > trickier than the masked approach.
> What I'm getting at, is that I think you have made the decision
> between these two implementations some time ago while looking at the C
> code. Now of course you would be a much better person to make that
> decision than - say - me. It's just that, if you want coherent
> feedback from us on this decision, we need to get some technical grasp
> of why you made it. I realize that it will not be easy to explain
> in detail, but honestly, it could be a useful discussion to have from
> your and our point of view, even if it ends up in the same place.
I've updated a section "Parameterized Data Type With NA Signal Values" in
the NEP with an idea for now an NA bit pattern approach could coexist and
work together with the mask-based approach. I think I've solved some of the
generality and implementation obstacles, it would be great to get some
feedback on that.
> See you,
> NumPy-Discussion mailing list
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion