[SciPy-Dev] views and mask NA
Charles R Harris
charlesr.harris@gmail....
Sat Jan 21 14:16:45 CST 2012
On Sat, Jan 21, 2012 at 12:49 PM, Benjamin Root <ben.root@ou.edu> wrote:
>
> On Fri, Jan 20, 2012 at 10:21 PM, Charles R Harris <
> charlesr.harris@gmail.com> wrote:
>
>> Hi All,
>>
>> I'd like some feedback on how mask NA should interact with views. The
>> immediate problem is how to deal with the real and imaginary parts of
>> complex numbers. If the original has a masked value, it should show up as
>> masked in the real and imaginary parts. But what should happen on
>> assignment to one of the masked views? This should probably clear the NA in
>> the real/imag part, but not in the complex original.
>
>
> That's a very sticky question. If one were to clear the NA on both the
> real and imaginary parts, we run the risk of possibly exposing
> uninitialized data. Remember, depending on how we finally decide how math
> is done with NA, creating a new array from some operations that had masks
> may not compute any value for those masked elements. So, if we assign to
> the real part and, therefore, clear that mask, the imaginary part may just
> be random bits.
>
> Conversely, if we were to keep the imaginary part masked, does that still
> make sense for mathematical operations? Say, perhaps, magnitudes or
> fourier transforms? Would it make sense to instead clear the mask on both
> real and imaginary parts and merely assume as assigning to the real part
> implicitly means a zero assignment to the imaginary part (and vice-versa).
> Mathematically, this makes sense to me since it would be equivalent, but as
> a programmer, this thought makes me cringe. Consider making an assignment
> first to the real part, and then to the imaginary part, the second
> assignment would wipe out the first (if we want to be consistent).
>
> Are there use cases for separately making assignments to the real and
> imaginary parts? Would we want the zero assignment to happen *only* if
> there was a mask, but not if there wasn't a mask? This gets very icky,
> indeed.
>
>
>
>> However, that does allow touching things under the mask, so to speak.
>>
>>
> Remember, some forms of missingness that we have discussed allows for
> "unmasking", while other forms do not. However, currently, the NEP does
> not allow for touching things under the mask, IIRC.
>
>
>
>> Things get more complicated if the complex original is viewed as reals.
>> In this case the mask needs to be "doubled" up, and there is again the
>> possibility of touching things beneath the mask in the original. Viewing
>> the original as bytes leads to even greater duplication.
>>
>>
> Let's also think of it in the other direction. Let's say I have an array
> of 32-bit ints and I view them as 64-bit ints. This is what currently
> happens:
>
> >>> a = np.array([1, 2, 3, np.NA, 5, 6, 7, 8, 9, 10], dtype='i4')
> >>> a.view('i8')
> array([8589934593, 3, 25769803781, NA, 42949672969], dtype=int64)
> >>> a = np.array([1, 2, np.NA, 4, 5, 6, 7, 8, 9, 10], dtype='i4')
> >>> a.view('i8')
> array([8589934593, 17179869206, NA, 34359738375, 42949672969],
> dtype=int64)
>
> Depending on the position of the NA, the view may or may not get the NA.
> I would imagine that this is also endian-dependent. I am not entirely
> certain of what the correct behavior should be, but I think the answer to
> this is also related to the answer to the real/imaginary case.
>
>
>> My thought is that touching the underlying data needs to be allowed in
>> these cases, but the original mask can only be cleared by assignment to the
>> original. Thoughts?
>>
>>
> Such a restriction would likely prove problematic. When we create
> functions and other libraries, we are not aware of whether we are dealing
> with a view of an array or the original. Heck, most of the time, I am not
> paying attention to whether I am using a view or not in my own programs.
> The transparency of views has been a major selling point to me for numpy.
> Eventually, (my understanding is that) views will become completely
> indistinguishable from the original numpy array in all of the remaining
> corner cases (boolean assignments and such).
>
> If we decide to make NA-related assignments different for views than
> originals, then it only increases the contrast between numpy arrays and
> views. In a language like Python, this would likely be a bad thing.
>
> Unfortunately, I am not sure of what should be the solution. But I hope
> this spurs further discussion.
>
>
Note that in normal views the mask is also a view:
In [1]: a = ones(5, maskna=1)
In [2]: a[1] = NA
In [3]: a
Out[3]: array([ 1., NA, 1., 1., 1.])
In [4]: b = a[1::2]
In [5]: b
Out[5]: array([ NA, 1.])
In [6]: b[0] = 1
In [7]: b
Out[7]: array([ 1., 1.], maskna=True)
In [8]: a
Out[8]: array([ 1., 1., 1., 1., 1.], maskna=True)
In [10]: a[1] = NA
In [11]: b = a.view(int64)
In [12]: b
Out[12]:
array([4607182418800017408, NA, 4607182418800017408, 4607182418800017408,
4607182418800017408])
In [13]: b[1] = 0
In [14]: a
Out[14]: array([ 1., 0., 1., 1., 1.], maskna=True)
Where the problems happen is when the item sizes don't match.
Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/scipy-dev/attachments/20120121/9357c7db/attachment.html
More information about the SciPy-Dev
mailing list