[Numpy-discussion] Subclasses - use of __finalize__

Pierre GM pgmdevlist at gmail.com
Mon Dec 18 12:12:50 CST 2006


On Saturday 16 December 2006 19:55, Colin J. Williams wrote:
Colin,

First of all, a disclaimer: I'm a (bad) hydrologist, not a computer scientist. 
I learned python/numpy  by playing around, and really got into subclassing 
since 3-4 months ago. My explanations might not be completely accurate, I'll 
ask more experienced users to correct me if I'm wrong.


`__new__` is the class constructor method. A call to `__new__(cls,...)` 
creates a new instance of the class `cls`, but doesn't initialize the 
instance, that's the role of the `__init__` method. According to the python 
documentation, 

    If __new__() returns an instance of cls, then the new instance's 
__init__() method will be invoked like "__init__(self[, ...])", where self is 
the new instance and the remaining arguments are the same as were passed to 
__new__().
    If __new__() does not return an instance of cls, then the new instance's 
__init__() method will not be invoked.
    __new__() is intended mainly to allow subclasses of immutable types (like 
int, str, or tuple) to customize instance creation. 

It turns out that ndarrays behaves as immutable types, therefore an `__init__` 
method is never called. How can we initialize the instance, then ? By calling 
`__array_finalize__`.
`__array_finalize__` is called automatically once an instance is created with 
`__new__`. Moreover, it is called each time a new array is returned by a 
method, even if the method doesn't specifically call `__new__`. 
For example, the `__add__`, `__iadd__`, `reshape` return new arrays, so 
`__array_finalize` is called. Note that these methods do not create a new 
array from scratch, so there is no call to `__new__`.
As another example, we can also modify the shape of the array with `resize`. 
However, this method works in place, so a new array is NOT created. 

About the `obj` argument in `__array_finalize__`:
The first time a subarray is created, `__array_finalize__` is called with the 
argument `obj` as a regular ndarray. Afterwards, when a new array is returned 
without ccall to `__new__`, the `obj` argument is the initial subarray (the 
one calling the method).

The easier is to try and see what happens. Here's a small script that defines 
a `InfoArray` class: just a ndarray with a tag attached. That's basically the 
class of the wiki, with messages printed in `__new__` and 
`__array_finalize__`. I join some doctest to illustrate some of the concepts, 
I hope it will be explanatory enough.

Please let me know whether it helps. If it does, I'll update the wiki page

##############################################

"""
Let us define a new InfoArray object    
>>> x = InfoArray(N.arange(10), info={'name':'x'})
__new__ received <type 'numpy.ndarray'>
__new__ sends <type 'numpy.ndarray'> as <class '__main__.InfoArray'>
__array_finalize__ received <type 'numpy.ndarray'>
__array_finalize__ defined <class '__main__.InfoArray'>
    
Let's get the first element: 
>>> x[0]
0

We expect a scalar, we get a scalar, everything's fine. If now we want all the 
elements, we can use `x[:]`, which calls `__getslice__` and returns a new 
array. Therefore, we expect `__array_finalize__` to get called:
>>> x[:]
__array_finalize__ received <class '__main__.InfoArray'>
__array_finalize__ defined <class '__main__.InfoArray'>
InfoArray([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])


Let's add 1 to the array: this operation calls the `__add__` method, which 
returns a new array from `x`
>>> x+1
__array_finalize__ received <class '__main__.InfoArray'>
__array_finalize__ defined <class '__main__.InfoArray'>
InfoArray([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

Let us change the shape of the array from *(10,)* to *(2,5)* with  the 
`reshape` method. The method returns a new array, so we expect a call to 
`array_finalize`:
>>> y = x.reshape((2,5))
__array_finalize__ received <class '__main__.InfoArray'>
__array_finalize__ defined <class '__main__.InfoArray'>

If now we print y, we call the __repr__ method, which in turns defines as many 
arrays as rows: we expect 2 calls to `__array_finalize__`:
>>> print y
__array_finalize__ received <class '__main__.InfoArray'>
__array_finalize__ defined <class '__main__.InfoArray'>
__array_finalize__ received <class '__main__.InfoArray'>
__array_finalize__ defined <class '__main__.InfoArray'>
[[0 1 2 3 4]
 [5 6 7 8 9]]

Let's change the shape of `y` back to *(10,)*, but using the `resize` method 
this time. `resize` works in place, so a new array isn't be created, and 
`array_finalize` is not called.
>>> y.resize((10,))
>>> y.shape
(10,)

OK, and what about `transpose` ? Well, it returns a new array (1 call), plus 
as we print it, we have *rows* calls to `array_finalize`, a total of *rows+1* 
calls
>>> y.resize((5,2))
>>> print y.T
__array_finalize__ received <class '__main__.InfoArray'>
__array_finalize__ defined <class '__main__.InfoArray'>
__array_finalize__ received <class '__main__.InfoArray'>
__array_finalize__ defined <class '__main__.InfoArray'>
__array_finalize__ received <class '__main__.InfoArray'>
__array_finalize__ defined <class '__main__.InfoArray'>
[[0 1 2 3 4]
 [5 6 7 8 9]]



Now let's create a new array from scratch. `__new__` is called, but as the 
argument is already an InfoArray, the *__new__ sends...* line is bypassed. 
Moreover, if we don't precise the type, we call `data.astype` which in turn 
calls `__array_finalize__`. Then, `__array_finalize__` is called a second 
time, this time to initialize the new object.
>>> z = InfoArray(x)
__new__ received <class '__main__.InfoArray'>
__new__ saw another dtype.
__array_finalize__ received <class '__main__.InfoArray'>
__array_finalize__ defined <class '__main__.InfoArray'>
__array_finalize__ received <class '__main__.InfoArray'>
__array_finalize__ defined <class '__main__.InfoArray'>

Note that if we precise the dtype, we don't have to call `data.astype`, and 
`__array_finalize`` gets called once:
>>> z = InfoArray(x, dtype=x.dtype)
__new__ received <class '__main__.InfoArray'>
__new__ saw the same dtype.
__array_finalize__ received <class '__main__.InfoArray'>
__array_finalize__ defined <class '__main__.InfoArray'>

"""

import numpy as N

class InfoArray(N.ndarray):
    def __new__(subtype, data, info=None, dtype=None, copy=False):
        # When data is an InfoArray
        print "__new__ received %s" % type(data)
        if isinstance(data, InfoArray):
            if not copy and dtype==data.dtype:
                print "__new__ saw the same dtype."
                return data.view(subtype)
            else:
                print "__new__ saw another dtype."
                return data.astype(dtype).view(subtype)
        subtype._info = info
        subtype.info = subtype._info
        print "__new__ sends %s as %s" % (type(N.asarray(data)), subtype)
        return N.array(data).view(subtype)
    
    def __array_finalize__(self,obj):
        print "__array_finalize__ received %s" % type(obj)
        if hasattr(obj, "info"):
            # The object already has an info tag: just use it
            self.info = obj.info
        else:
            # The object has no info tag: use the default
            self.info = self._info
        print "__array_finalize__ defined %s" % type(self)


def _test():
    import doctest
    doctest.testmod(verbose=True)

if __name__ == "__main__":
    _test()


More information about the Numpy-discussion mailing list