[Numpy-discussion] maskedarray: how to force mask to expand
Vincent Schut
schut@sarvision...
Thu Sep 25 03:37:56 CDT 2008
Pierre GM wrote:
> Vincent,
>
> You should really consider putting an example next time. I must admit that I'm
> not sure what you're trying to do, and where/why it fails.
Pierre,
sorry for that, I was posting hastily before leaving work, and was
myself pretty confused about ma's behaviour on this too, so it was hard
for me to explain or phrase my question clearly.
It just feels a bit strange that ma.array by default gives a mask
without shape and of False. I mean, what's the difference then between
that and a normal numpy array? If I did not want a mask, I'd use
numpy.array. I do want a mask, so I'd expect ma to give me a mask, which
it in fact does not (or does, on which we can have different opinions,
but a default mask of False imho == nomask == no mask). OK, that being
said, I understand the argument of backwards compatibility. I disagree
on the argument of speed, because for that the same applies: if I were
really concerned about speed, I'd use numpy arrays, keep a separate mask
myself, and before any operation I'd get a flattened copy of all my data
that is not masked and run the operation on that. IMHO masked arrays are
there to trade speed for convenience, so that's what I expect.
Just for clarity, to rephrase my question: how do I force ma to give me
(always/by default/by some method on a maskedarray) a full shaped mask
instead of 'False' or nomask? Because I am sure from the beginning that
I'll need this mask in full shape, I want it, and I want to be able to
treat it like any normal bool array :-)
>
> Yes, by default, the mask of a new MaskedArray is set to the value 'nomask',
> which is the boolean False. Directly setting an element of the mask in that
> condition fails of course. The reasons behind using this behavior are (1)
> backward compatibility and (2) speed, as you can bypass a lot of operations
> on the mask when it is empty.
1) is clear
2) seems unintuitive to me. I'd say, use numpy arrays then, use
.filled() before you do something, or use a flag 'bypass_mask=True',
etc. Any of these seem more intuitive to me that what is does now. No
offence, I really appreciate your work, just my 2c for a possible future...
>
> If you need to mask one or several elements, the easiest is not to modify the
> mask itself, but to use the the special value `masked`:
>
>>>> a = ma.array(np.arange(6).reshape(3,2))
> masked_array(data =
> [[0 1]
> [2 3]
> [4 5]],
> mask =
> False,
> fill_value=999999)
>>>> # Mask the first element.
>>>> a[0,0] = ma.masked
Ah, I did not know that one. Does that always work, I mean, with slices,
fancy indexing, etc.? Like 'a[a<0 | a>100] = ma.masked'? It's kind of
clean to fiddle with the mask of the array without really interacting
with the mask itself, if you understand what I mean... :)
And is there also a complement, like ma.unmasked? I could not find it
(very quick search, I admit)... Or can I use !ma.masked?
>>>> a
> masked_array(data =
> [[-- 1]
> [2 3]
> [4 5]],
> mask =
> [[ True False]
> [False False]
> [False False]],
> fill_value=999999)
>
> This value, `masked`, is also useful to check whether one particular element
> is masked:
>>>> a[0,0] is ma.masked
> True
>>>> a[0,1] is ma.masked
> False
>
> You can also force the mask to be full of False with the proper shape by that
> way:
>>>> a = ma.array(np.arange(6).reshape(3,2)
>>>> # Force the mask to have the proper shape and be full of False:
>>>> a.mask = False
> masked_array(data =
> [[0 1]
> [2 3]
> [4 5]],
> mask =
> [[False False]
> [False False]
> [False False]],
> fill_value=999999)
Ah, now the magic starts... (normal user cap on head, beware):
In [9]: am.mask
Out[9]: False
In [10]: am.mask = False
In [11]: am.mask
Out[11]:
array([[False, False],
[False, False]], dtype=bool)
while (with the same am as before [9], with am.mask == False):
In [15]: am.mask = am.mask
In [16]: am.mask
Out[16]: False
Do you see (and agree with me about) the inconsistency? Setting am.mask
with its own value changes that same value of am.mask. While am.mask =
am.mask, which on first sight should be the same as am.mask = False, as
am.mask==False is True, does *not* change the value of am.mask...
>
>
> The shrink argument of ma.array collapses amask full of False to nomask, once
> again for speed reasons. So no, it won't do what you look like to want.
I already supposed so...
>
> I agree that having to deal with nomask is not completely intuitive. However,
> it is required for backward compatibility. One day, the class will be ported
> to C, and then I'll push to have the mask set to the proper shape ab initio,
> because then speed will be less of an issue.
Glad that we share opinions about the unintuitiveness... Eagerly
awaiting the port to C, not (only) for speed, but mainly for consistency.
>
> In the meantime, I hope I answered your question.
Well, yes and no. To resume:
by default, the mask of a masked array (if not given at creation as a
bool array) is always 'False'. There is no keyword to force the mask at
creation to full shape, and there is no method on a maskedarray to
change the mask to full shape.
However, one can apply some magic and use 'a.mask' = False directly
after creation to force the mask to full shape. This of course only
works when the mask already *was* False, otherwise you'll be effectively
changing your mask. So we presume ma never by default returns a mask of
'True', and then this works. The obvious trick to workaround this remote
possibility of a mask of 'True' would be a.mask = a.mask, but that does
not work.
Hey, sorry about starting a discussion about this, while I meant to ask
just a simple question (and really assumed I had overlooked something,
it seemed so simple...). Again, no offence meant, and your work on ma is
really appreciated. I hope this discussion will result in more
intuitiveness in a future (C?) implementation of ma.
Regards,
Vincent.
>
>
> On Wednesday 24 September 2008 06:25:57 Vincent Schut wrote:
>> Probably I'm just overlooking something obvious, but I'm having problems
>> with maskedarrays (numpy.ma from svn: '1.3.0.dev5861'), the mask by
>> default being a single bool value ('False') instead of a properly sized
>> bool array. If I then try to mask one value by assigning values to
>> certain mask positions (a.mask[0,0]=True) I get an error, logically. I
>> know I can use mask_where, but I like the mask[...] idiom. And I have to
>> expand the mask anyway, as I'm gonna write it to a file at the end.
>>
>> 1) Is there a way to have ma always use properly expanded masks (bool
>> arrays instead of single bool values)? I tried the shrink=False keyword,
>> but that does not do what I want, and is not available for
>> numpy.ma.zeros, which I conveniently use a lot.
>>
>> 2) Is there a method/function to request the mask, be it a single bool
>> value or an array, as a properly sized array? I found shrink_mask but no
>> opposite method, and shrink_mask seems to do something subtly different
>> even.
>>
>> Regards,
>> Vincent.
>>
>> _______________________________________________
>> Numpy-discussion mailing list
>> Numpy-discussion@scipy.org
>> http://projects.scipy.org/mailman/listinfo/numpy-discussion
More information about the Numpy-discussion
mailing list