[Numpy-discussion] Ransom Proposals

Tim Hochberg tim.hochberg at cox.net
Mon Mar 27 18:41:02 CST 2006


Travis Oliphant wrote:

> Tim Hochberg wrote:
>
>> Travis Oliphant wrote:
>>
>>> Tim Hochberg wrote:
>>>
>>>>>
>>>>> Yes, having this ability means that you have to think about it a 
>>>>> bit if you are going to use the functional interface and try to do 
>>>>> in-place operations.  But, I would argue that this is an 
>>>>> "advanced" usage which is implemented to save space and time.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> How is this true though? In what way, for instance, is:
>>>>
>>>>    b = asarray(a).reshape(newshape)
>>>>
>>>> slower or less space efficient than todays:
>>>>
>>>>    b = reshape(a)
>>>
>>>
>>>
>>>
>>>
>>> Well, the big difference is that b=reshape(a) is actually
>>>
>>> try:
>>>     reshape = a.reshape
>>> except AttributeError:
>>>     return a.__array_wrap__(asarray(a).reshape(newshape))
>>>
>>> return reshape(newshape)
>>>
>>> So, it handles more cases, more cleanly then the 
>>> asarray(a).rehsape(newshape) approach does.
>>
>>
>>
>> OK. Although I should have said asanyarray(a).reshape(newshape),  I 
>> still see how that this handles more cases. 
>
>
> Because you can define objects that aren't sub-classes of the array at 
> all but just define the method reshape and still have the functional 
> interface work.
>
>>
>> I need to go think about this case some more. This is not something 
>> that I've run into in practice, but I can see I'll have a stronger 
>> case if I can come up with a alternative to this in terms of safe 
>> functions. Do you have some examples of objects that do not have 
>> 'reshape', but do have '__array_wrap__'?
>
>
>
> Not presently, the __array_wrap__ method is new.  It's a mechanism for 
> letting functions that need arrays internally to be more polymorphic 
> and deal with different kinds of objects.
>
>
> It would seem that most of Tim's complaints are directly against 
> polymorphism.   I would argue that getting rid of polymorphism is not 
> something we should be trying to do in NumPy.

I'm certainly not against polymorphism in general. However, I do believe 
that the intersection of polymorphism with view/copy behaviour can be 
dangerous in certain circumstances, and we should limit those 
circumstances to a few, visible functions.  The cases I've been 
complaining about are all instances of this pattern:

    b = func(a, *other_args, **other_kwargs)

With the added complication that b may be a view of a or it may be a new 
array object. This particular case is 'sharp blade' and should be 
treated with respect.

Here's how I would probably write reshape if I were starting a Numeric 
like language from scratch:

    def reshape(a, shape):
        b = a.view()
        b.shape = shape
        return b

That's no less polymorphic than the current reshape, but it's 
polymorphic in a different way. It would work on anything that has view 
method and a shape parameter. This as opposed to anything that has 
reshape or has __arrary_wrap__ and can be turned into an array. This is 
a nonstarter for backwards compatibility reasons however.

>
> Tim is pointing out that if you use polymorphic functions then you 
> can't assume things like "in-place" have any meaning.  But, then 
> Python has this same problem, because
>
> l += a
>
> doesn't do 'in-place' for the list, but does do inplace if 'l' were an 
> array.   In my mind, this problem and Tim's concern over 'view' and 
> 'copy' behavior is one and the same.   

I agree.

> I don't view it as a problem of function behavior as much as problem 
> with documentation and "mismatch between what a particular user 
> expects and what is actually done."

I disagree. Fernando stated this much better than I could:

    We should treat them as expensive specialties we must pay for with a
    very tight budget (the willingness and ability of users to keep
    track of the exponential rise in complexity induced by interlocking
    special cases).

>
> My attitude is that to write 'in-place' functions (what seems to be 
> the driving example that Tim brings up), you need to know some details 
> about what kind of object you are dealing with anyway, so you can't 
> write polymorphic in-place functions very easily.

Correct. I doubt writing polymorphic in place functions will ever be 
easy. But the choice of tools we provide can influence how often they 
end up being correct.

>
> So, I guess from one perspective, Tim is arguing against things that 
> are at the very core of Python itself.  I'm very resistant to his 
> desire to remove or significantly alter the functional behavior.
>
> I also think there is a very good reason to have a few methods that 
> return either a view or a copy.   The reason is that there are some 
> arrays that the method can't be implemented unless a copy is made.   
> The only other possibility --- "raising an error" --- is going to 
> confuse a lot of new users and force them to deal with the underlying 
> memory-layout of an array before they really need to.

I don't think this does anyone any favors myself, it just lets people 
get into habits early on that are going to bite them later.

> I think the current compromise is practical and disagree strongly that 
> it is somehow "evil."   It's only evil to somebody trying to do 
> in-place work.  

I think *most* of the current compromise is good.

> If you are doing that in a polymorphic language like Python, then you 
> need to understand the actual objects you are dealing with and should 
> be expected to have a more mature understanding of NumPy.

FWIW, I've been using Numeric and it's successors since Numeric was 
pre-alpha, when Jim Huginin was still around, when giants walked the 
earth. My understanding of numerapy [*] may be flawed, but it's 
definitely mature.

[*] Referring to the whole Numeric/numarray/numpy family

In general, when balancing safety, power and speed, numerapy has leaned 
heavily toward  speed and power. I think that is the right choice. 
However, in the case at hand, there is no speed tradeoff: you'll note, 
for example, that I have consistently avoided advocating that reshape 
and friends return copies. The trade off has been between safety and 
power, and in this case the power is purely theoretical at this point 
since there aren't yet any clients to __array_wrap__ that don't define 
reshape. I would think that is a very narrow slice of potential object 
space.

That being said I have a suggestion that *might* satisfy everyone. Set 
the WRITEABLE flag to false if reshape creates a new array:

        def _viewwrapit(obj, method, *args, **kwds):
            try:
                wrap = obj.__array_wrap__
            except AttributeError:
                wrap = None
            if type(obj) is ndarray:
                writeable = True
            else:
                writeable = False
                obj = asarray(obj)
            result = getattr(obj, method)(*args, **kwds)
            result.flags.writeable = writeable
            if wrap:
                result = wrap(result)
            return result
           
        def reshape(a, newshape, order=False):
            try:
                reshape = a.reshape
            except AttributeError:
                return _viewwrapit(a, 'reshape', newshape, order=order)
            else:
                return reshape(newshape, order=order)

        a = arange(4)
        a2 = reshape(a, [2,2])
        a2[0,1] = 99
        print "a2 =", a2
        print "a = ", a

        l = [0, 1, 2, 3]
        l2 = reshape(l, [2,2])
        l2[0,1] = 99


    ==>

    a2 = [[ 0 99]
     [ 2  3]]
    a =  [ 0 99  2  3]
    Traceback (most recent call last):
      File "scratch.py", line 37, in ?
        l2[0,1] = 99
    RuntimeError: array is not writeable


Thus we get the same semantics as we have now exept in the one case that 
causes trouble: modifying the result of a function that returns both 
copies and views when it has returned a copy. The only catch is that 
clients of __array_wrap__ would need to be strongly  encouraged to 
respect the writeable flag. However, since __array_wrap__ is new, this 
is probably not a big deal. This could also help solve the problem of 
how to deal with a.reshape(newshape) when the result would be a copy; 
simple make the copy nonwriteable. However, that case is more complex 
and requires more thought.

No power is lost; 'advanced' users can simply set the writeable flag 
back to True. If we need to be able to do this in one line, setflags 
could be modified to return self, so that:

    reshape(a, newshape).setflags(write=True)

is essentially equivalent to the current reshape. [A side note, it kind 
of looks like setflags may need some revamping; shouldn't the flag 
values in setflags match those in flags?]

The result of reshape(a, newshape), when a new object is required can 
then *almost* be regarded as a view of a constant object. This is 
because modifying the operand and expecting it to change the result is 
extremely rare. Unlike modifying the result and expecting it to change 
the operand, which while not exactly common ,is not that rare and is how 
one gets into trouble.

Regards,

-tim










More information about the Numpy-discussion mailing list