[Numpy-discussion] maskedarray: how to force mask to expand

Vincent Schut schut@sarvision...
Thu Sep 25 03:37:56 CDT 2008


Pierre GM wrote:
> Vincent,
> 
> You should really consider putting an example next time. I must admit that I'm 
> not sure what you're trying to do, and where/why it fails.

Pierre,

sorry for that, I was posting hastily before leaving work, and was 
myself pretty confused about ma's behaviour on this too, so it was hard 
for me to explain or phrase my question clearly.
It just feels a bit strange that ma.array by default gives a mask 
without shape and of False. I mean, what's the difference then between 
that and a normal numpy array? If I did not want a mask, I'd use 
numpy.array. I do want a mask, so I'd expect ma to give me a mask, which 
it in fact does not (or does, on which we can have different opinions, 
but a default mask of False imho == nomask == no mask). OK, that being 
said, I understand the argument of backwards compatibility. I disagree 
on the argument of speed, because for that the same applies: if I were 
really concerned about speed, I'd use numpy arrays, keep a separate mask 
myself, and before any operation I'd get a flattened copy of all my data 
that is not masked and run the operation on that. IMHO masked arrays are 
there to trade speed for convenience, so that's what I expect.
Just for clarity, to rephrase my question: how do I force ma to give me 
(always/by default/by some method on a maskedarray) a full shaped mask 
instead of 'False' or nomask? Because I am sure from the beginning that 
I'll need this mask in full shape, I want it, and I want to be able to 
treat it like any normal bool array :-)
> 
> Yes, by default, the mask of a new MaskedArray is set to the value 'nomask', 
> which is the boolean False. Directly setting an element of the mask in that 
> condition fails of course. The reasons behind using this behavior are (1) 
> backward compatibility and (2) speed, as you can bypass a lot of operations 
> on the mask when it is empty.
1) is clear
2) seems unintuitive to me. I'd say, use numpy arrays then, use 
.filled() before you do something, or use a flag 'bypass_mask=True', 
etc. Any of these seem more intuitive to me that what is does now. No 
offence, I really appreciate your work, just my 2c for a possible future...
> 
> If you need to mask one or several elements, the easiest is not to modify the 
> mask itself, but to use the the special value `masked`:
> 
>>>> a = ma.array(np.arange(6).reshape(3,2))
> masked_array(data =
>  [[0 1]
>  [2 3]
>  [4 5]],
>       mask =
>  False,
>       fill_value=999999)
>>>> # Mask the first element.
>>>> a[0,0] = ma.masked
Ah, I did not know that one. Does that always work, I mean, with slices, 
fancy indexing, etc.? Like 'a[a<0 | a>100] = ma.masked'? It's kind of 
clean to fiddle with the mask of the array without really interacting 
with the mask itself, if you understand what I mean... :)

And is there also a complement, like ma.unmasked? I could not find it 
(very quick search, I admit)... Or can I use !ma.masked?
>>>> a
> masked_array(data =
>  [[-- 1]
>  [2 3]
>  [4 5]],
>       mask =
>  [[ True False]
>  [False False]
>  [False False]],
>       fill_value=999999)
> 
> This value, `masked`, is also useful to check whether one particular element 
> is masked:
>>>> a[0,0] is ma.masked
> True
>>>> a[0,1] is ma.masked
> False
> 
> You can also force the mask to be full of False with the proper shape by that 
> way:
>>>> a = ma.array(np.arange(6).reshape(3,2)
>>>> # Force the mask to have the proper shape and be full of False:
>>>> a.mask = False
> masked_array(data =
>  [[0 1]
>  [2 3]
>  [4 5]],
>       mask =
>  [[False False]
>  [False False]
>  [False False]],
>       fill_value=999999)
Ah, now the magic starts... (normal user cap on head, beware):

In [9]: am.mask
Out[9]: False

In [10]: am.mask = False

In [11]: am.mask
Out[11]:
array([[False, False],
        [False, False]], dtype=bool)

while (with the same am as before [9], with am.mask == False):

In [15]: am.mask = am.mask

In [16]: am.mask
Out[16]: False

Do you see (and agree with me about) the inconsistency? Setting am.mask 
with its own value changes that same value of am.mask. While am.mask = 
am.mask, which on first sight should be the same as am.mask = False, as 
am.mask==False is True, does *not* change the value of am.mask...
> 
> 
> The shrink argument of ma.array collapses amask full of False to nomask, once 
> again for speed reasons. So no, it won't do what you look like to want.

I already supposed so...
>  
> I agree that having to deal with nomask is not completely intuitive. However, 
> it is required for backward compatibility. One day, the class will be ported 
> to C, and then I'll push to have the mask set to the proper shape ab initio, 
> because then speed will be less of an issue.

Glad that we share opinions about the unintuitiveness... Eagerly 
awaiting the port to C, not (only) for speed, but mainly for consistency.
> 
> In the meantime, I hope I answered your question.

Well, yes and no. To resume:
by default, the mask of a masked array (if not given at creation as a 
bool array) is always 'False'. There is no keyword to force the mask at 
creation to full shape, and there is no method on a maskedarray to 
change the mask to full shape.
However, one can apply some magic and use 'a.mask' = False directly 
after creation to force the mask to full shape. This of course only 
works when the mask already *was* False, otherwise you'll be effectively 
changing your mask. So we presume ma never by default returns a mask of 
'True', and then this works. The obvious trick to workaround this remote 
possibility of a mask of 'True' would be a.mask = a.mask, but that does 
not work.

Hey, sorry about starting a discussion about this, while I meant to ask 
just a simple question (and really assumed I had overlooked something, 
it seemed so simple...). Again, no offence meant, and your work on ma is 
really appreciated. I hope this discussion will result in more 
intuitiveness in a future (C?) implementation of ma.

Regards,
Vincent.
> 
> 
> On Wednesday 24 September 2008 06:25:57 Vincent Schut wrote:
>> Probably I'm just overlooking something obvious, but I'm having problems
>> with maskedarrays (numpy.ma from svn: '1.3.0.dev5861'), the mask by
>> default being a single bool value ('False') instead of a properly sized
>> bool array. If I then try to mask one value by assigning values to
>> certain mask positions (a.mask[0,0]=True) I get an error, logically. I
>> know I can use mask_where, but I like the mask[...] idiom. And I have to
>> expand the mask anyway, as I'm gonna write it to a file at the end.
>>
>> 1) Is there a way to have ma always use properly expanded masks (bool
>> arrays instead of single bool values)? I tried the shrink=False keyword,
>> but that does not do what I want, and is not available for
>> numpy.ma.zeros, which I conveniently use a lot.
>>
>> 2) Is there a method/function to request the mask, be it a single bool
>> value or an array, as a properly sized array? I found shrink_mask but no
>> opposite method, and shrink_mask seems to do something subtly different
>> even.
>>
>> Regards,
>> Vincent.
>>
>> _______________________________________________
>> Numpy-discussion mailing list
>> Numpy-discussion@scipy.org
>> http://projects.scipy.org/mailman/listinfo/numpy-discussion



More information about the Numpy-discussion mailing list