[AstroPy] State of Numeric 2

W.T. Bridgman wtbridgman at Radix.Net
Fri Dec 1 19:17:19 CST 2000


<x-flowed>Paul Barrett sends this report on his work on Numeric 2.

Tom
----------------------------------------------------------------------

                          Status of Numeric 2


The design of Numeric 2 enables new array types to be easily addd and
all array operations to be UFuncs.  This provides more extensible,
flexible, and maintainable code.  What follows is an outline of the
basic design of Numeric2 and what we have accomplished so far.

There are currently three primary classes:

ArrayType:

    This is a simple class that describes the fundamental properties of
    an array-type, e.g. its name, its size in bytes, its coercion
    relations with respect to other types, etc.  An instance of this
    type creates a singleton of that type, e.g.

    > Int32 = ArrayType('Int32', 4, 'doc-string')

    Its relation to the other types is defined when the C-extension
    module for that type is imported.  The corresponding Python code is

    > Int32.astype[Real64] = Real64

    This says that the Real64 array-type has higher priority that the
    Int32 array-type.


UFunc:

    This class is the heart of Numeric 2.  Its design is similar to
    that of ArrayType in that the UFunc creates a singleton callable
    object whose attributes are name, argument type (either input or
    output), and a CFunc dictionary; e.g.

    > add = UFunc('add', ('in', 'in', 'out'), 'doc-string')

    When defined the add instance has no C functions associated with it
    and therefore can do no work.  The CFunc dictionary is populated
    later when the C-extension module for an array-type is imported.
    The corresponding Python code would be

    > add.register('add', (Int32, Int32, Int32), cfunc-add)

    In the C-extension modules initialization function, there are two C
    API functions: one to initialize the coercion rules and the other
    to register the CFunc objects.

    When an operation is applied to some arrays, the __call__ method is
    invoked.  It gets the type of each array (if the output array is
    not given, it is created with the appropriate type.) and checks the
    CFunc dictionary for a key that matches the argument types.  If it
    exists the operation is performed immediately, otherwise the best
    key is found and that operation with its associated conversion
    functions is used.  The __call__ method then invokes a compute
    method written in C to iterate over slices of each array, namely:

    > _ufunc.compute(slice, data, func, swap, conv)

    The func argument is a CFuncObject, while swap and conv are lists
    of CFuncObjects, one for each array if necessary.  The data
    argument is a list of buffer objects, one for each array, and the
    slice argument is a complex object specifying how many iterations
    to be done for each dimension, and the buffer offset and step size
    for each array and each dimension.

    We have predefined several UFuncs for use by the __call__ method,
    they are cast, swap, getitem, and setitem.  The cast and swap
    functions do coercion and byte-swapping, resp. and the getitem and
    setitem functions do conversion between Numeric arrays and Python
    sequences.  Other functions can be defined arbitrarily.

Array:

    This class contains information about the array, such as shape,
    type, endian-ness of the data, etc..  Its operators, '+', '-',
    etc. just invoke the corresponding UFunc function, e.g.

    > def __add__(self, other):
    >     return ufunc.add(self, other)

C-extension modules:

    Numeric2 will have several C-extension modules.  The primary module
    of this set is the _ufuncmodule.c.  The intention of this module is
    to do the bare minimum, i.e. iterate over arrays using a specified
    C function.  The interface of these functions remains the same as
    for the current Numeric, i.e.

    int (*CFunc)(char *data, int *steps, int repeat, void *func);

    and their functionality is expected to be the same, i.e. they
    iterate over the inner-most dimension.

    There will also be C-extension modules for each array type,
    e.g. _int32module.c, _real64module.c, etc.  As I said before, when
    these modules are imported by the UFunc module, they will
    automatically register their functions and coercion rules.  New or
    improved versions of these modules can be easily implemented and
    used without affecting the rest of Numeric2.


That's basically it.  As for progress, we have outlined the following
steps:

Step 1: implement basic UFunc capability

  - minimal Array class, ie. necessary class attributes and methods
    eg. .shape, .data, .type, etc.
  - minimal ArrayType class, eg. Int32, Real64, Complex64, Char, Object
  - minimal UFunc class, ie. UFunc instantiation, CFunction
    registration, UFunc call for 1D arrays including the rules for
    doing alignment, byte-swapping, and coercion.
  - minimal C-extension module (_UFunc) which does the innermost array
    loop in C.

    This step implements whatever is needed to do: 'c = add(a, b)'
    where a, b, and c are 1-D arrays.

    It will teach us how to add new UFuncs, to coerce the arrays, to
    pass the necessary information to a C iterator method and to do the
    actually computation.

Step 2: continue enhancing the UFunc iterator and Array class

  - implement some access methods for the Array class, print, repr,
    getitem, setitem, etc.
  - implement multidimensional arrays
  - implement some of basic Array methods using UFuncs, e.g. +, -, etc.
  - enable UFuncs to use Python sequences.

Step 3: complete the standard UFunc and Array class behavior

  - implement getslice and setslice behavior
  - work on Array broadcasting rules
  - implement Record type
  - implement reduce, reduceAt, and outer methods for UFuncs,

Step 4 is

  - add more UFuncs
  - implement buffer or mmap access
  - etc.


I've nearly completed Step 1.  The one major change/enhancement to
that step is to immediately implement iteration over multi-D arrays
instead of 1-D arrays.  Once this step is done, we will have working
code to test and analyze.  Since this design is so modular,
particularly with respect to the array type modules, some work can be
done in parallel.
_____________________________________________________
AstroPy mailing list  -  astropy at stsci.edu
http://lheawww.gsfc.nasa.gov/~bridgman/AstroPy/

</x-flowed>


More information about the AstroPy mailing list