[Numpy-discussion] maskedarray: how to force mask to expand

Vincent Schut schut@sarvision...
Fri Sep 26 02:45:55 CDT 2008


Thanks for your explanations. It still seems a little (too) complicated, 
but from a backwards-compatibility pov combined with your 'nomask is not 
False' implementation detail, I can understand mostly :-) I think the 
idea that when a.mask returns False, that actually means nomask instead 
of the False I'm used to, is what caused a major part of my confusion.
It might actually be nice to give you some context of why I asked this: 
during my (satellite image) processing, I use maskedarrays by default 
for each step in the processing chain, and I need to save the result of 
each step to disk. That means saving the array and its mask. I save both 
of them as tiff files (because these can include all other info that is 
nice for satellite imagery, like coordinates and projection). When 
saving the mask, I'm creating a tiff file and pushing the .mask array 
into it. Therefore, I obviously need the .mask not to be nomask, but to 
be a full shaped array. That's the context you need to see my confusion in.
Because speed and memory don't matter that much to me (well, they to 
matter, but I'm processing huge amounts of data anyways, and using 
parallel/clustered processing anyways, so I can take those masks...), I 
thought it might be easiest to make sure my data always has a full 
shaped mask. But, of course, perfomance-wise it would be best to be able 
to work with nomasks, and only expand these to full shaped masks when 
writing to disk. That's why I asked for a possible method on an ma to 
force expanding a mask, or e.g. an ma.mask.as_full_shaped_mask() method 
that returns either the mask, or (if nomask) a new array of Falses. I 
just supposed it existed and I could not find it, but now I understand 
it does not exist. But I could easily write something that checks for 
nomask and always returns an expanded mask.

The 'trick' to create ma's with the mask=False keyword is neat, I had 
not thought about that.
Same applies for masking values using ma[idx] = ma.masked.
Just for completeness (in case someone else is reading this and 
wondering how to *unmask* values): just setting ma[idx] to some valid 
number will unset the mask for that index. No need to do ma[idx] = 
ma.unmask or whatever, just ma[idx] = v.

OK, top posting is bad :) Further comments inline.

Pierre GM wrote:
> Vincent,
> The argument of speed (having a default mask of nomask) applies only to the 
> computations inside MaskedArray. Of course, it is still far faster not to use 
> masks but only ndarrays.
>> Just for clarity, to rephrase my question: how do I force ma to give me
>> (always/by default/by some method on a maskedarray) a full shaped mask
>> instead of 'False' or nomask? Because I am sure from the beginning that
>> I'll need this mask in full shape, I want it, and I want to be able to
>> treat it like any normal bool array :-)
> Easy:
>>>> a = ma.array([1,2,3,4], mask=False)
> masked_array(data = [1 2 3 4],
>       mask = [False False False False],
>       fill_value=999999)
> Puzzling ? Not so much. See, by default, the value of the mask parameter is 
> `nomask`. nomask is in fact a 0-size boolean ndarray with value 0 (False). At 
> the creation of the masked array, we check whether a value was given to the 
> mask parameter. If no value is given, we default to `nomask`, and we end up 
> with `a.mask is nomask`. If you force the mask parameter to the boolean 
> False, you're not using `nomask`: in that case, the full mask is created.

I understand that now.

> That won't work with ma.zeros or ma.ones. I could add an extra keyword to deal 
> witht that, but is it really needed when you can simply do a.mask=False ?

No real need for that, then. Would be just conveniencewise 
sugar-addition on (especially my) cake.
> Note: 
>>>> a=ma.array([1,2,3,])
>>>> a.mask is False
> False
>>>> a.mask is ma.nomask
> True
>>>> a.mask == False
> True
Btw, in future versions, would it be an idea to separate 'nomask' and 
'False' a little more? I always assumed (correctly?) that in python, 
True and False are singletons (as far as that is possible in python), 
just like None. 'False is False' should always compare to True, then. In 
this case (a.mask is False) it at least *seems* to break that 'rule'...

>> And is there also a complement, like ma.unmasked? I could not find it
>> (very quick search, I admit)... Or can I use !ma.masked?

Can just set elements to unmask them, found out that. No need for 
> No, there isn't and no you can't.  ma.masked is actually a constant defined 
> module-wide and independent of any array. That way, you can test whether an 
> element is masked with `a[..] is masked`. 
>> Ah, now the magic starts... (normal user cap on head, beware):
>> In [9]: am.mask
>> Out[9]: False
>> In [10]: am.mask = False
>> In [11]: am.mask
>> Out[11]:
>> array([[False, False],
>>         [False, False]], dtype=bool)
>> while (with the same am as before [9], with am.mask == False):
>> In [15]: am.mask = am.mask
>> In [16]: am.mask
>> Out[16]: False
>> Do you see (and agree with me about) the inconsistency?
> No. I'm afraid you're confusing `nomask` and `False`. Once again, nomask is 
> NOT the same thing as False. It's the same value, but not the same object.

Exactly. It might not be inconsistent, but imho it does a lot of effort 
to /feel/ inconsistent to Joe Average. And that's what was causing a lot 
of confusion. For example, why doesn't a.mask return 'nomask' instead of 
'False'? That would have saved me some head-scratching... See also my 
comment earlier on assuming False to be a python-wide constant 
(singleton). "It's the same value, but not the same object" imho breaks 
this python guarantee (though I admit that I'll have to look that up, 
this guarantee might just be there only in my head...)

Well, so far so good. My problems have been solved largely. The 
philosophical discussion could go on...


More information about the Numpy-discussion mailing list