[AstroPy] State of Numeric 2
wtbridgman at Radix.Net
Fri Dec 1 19:17:19 CST 2000
<x-flowed>Paul Barrett sends this report on his work on Numeric 2.
Status of Numeric 2
The design of Numeric 2 enables new array types to be easily addd and
all array operations to be UFuncs. This provides more extensible,
flexible, and maintainable code. What follows is an outline of the
basic design of Numeric2 and what we have accomplished so far.
There are currently three primary classes:
This is a simple class that describes the fundamental properties of
an array-type, e.g. its name, its size in bytes, its coercion
relations with respect to other types, etc. An instance of this
type creates a singleton of that type, e.g.
> Int32 = ArrayType('Int32', 4, 'doc-string')
Its relation to the other types is defined when the C-extension
module for that type is imported. The corresponding Python code is
> Int32.astype[Real64] = Real64
This says that the Real64 array-type has higher priority that the
This class is the heart of Numeric 2. Its design is similar to
that of ArrayType in that the UFunc creates a singleton callable
object whose attributes are name, argument type (either input or
output), and a CFunc dictionary; e.g.
> add = UFunc('add', ('in', 'in', 'out'), 'doc-string')
When defined the add instance has no C functions associated with it
and therefore can do no work. The CFunc dictionary is populated
later when the C-extension module for an array-type is imported.
The corresponding Python code would be
> add.register('add', (Int32, Int32, Int32), cfunc-add)
In the C-extension modules initialization function, there are two C
API functions: one to initialize the coercion rules and the other
to register the CFunc objects.
When an operation is applied to some arrays, the __call__ method is
invoked. It gets the type of each array (if the output array is
not given, it is created with the appropriate type.) and checks the
CFunc dictionary for a key that matches the argument types. If it
exists the operation is performed immediately, otherwise the best
key is found and that operation with its associated conversion
functions is used. The __call__ method then invokes a compute
method written in C to iterate over slices of each array, namely:
> _ufunc.compute(slice, data, func, swap, conv)
The func argument is a CFuncObject, while swap and conv are lists
of CFuncObjects, one for each array if necessary. The data
argument is a list of buffer objects, one for each array, and the
slice argument is a complex object specifying how many iterations
to be done for each dimension, and the buffer offset and step size
for each array and each dimension.
We have predefined several UFuncs for use by the __call__ method,
they are cast, swap, getitem, and setitem. The cast and swap
functions do coercion and byte-swapping, resp. and the getitem and
setitem functions do conversion between Numeric arrays and Python
sequences. Other functions can be defined arbitrarily.
This class contains information about the array, such as shape,
type, endian-ness of the data, etc.. Its operators, '+', '-',
etc. just invoke the corresponding UFunc function, e.g.
> def __add__(self, other):
> return ufunc.add(self, other)
Numeric2 will have several C-extension modules. The primary module
of this set is the _ufuncmodule.c. The intention of this module is
to do the bare minimum, i.e. iterate over arrays using a specified
C function. The interface of these functions remains the same as
for the current Numeric, i.e.
int (*CFunc)(char *data, int *steps, int repeat, void *func);
and their functionality is expected to be the same, i.e. they
iterate over the inner-most dimension.
There will also be C-extension modules for each array type,
e.g. _int32module.c, _real64module.c, etc. As I said before, when
these modules are imported by the UFunc module, they will
automatically register their functions and coercion rules. New or
improved versions of these modules can be easily implemented and
used without affecting the rest of Numeric2.
That's basically it. As for progress, we have outlined the following
Step 1: implement basic UFunc capability
- minimal Array class, ie. necessary class attributes and methods
eg. .shape, .data, .type, etc.
- minimal ArrayType class, eg. Int32, Real64, Complex64, Char, Object
- minimal UFunc class, ie. UFunc instantiation, CFunction
registration, UFunc call for 1D arrays including the rules for
doing alignment, byte-swapping, and coercion.
- minimal C-extension module (_UFunc) which does the innermost array
loop in C.
This step implements whatever is needed to do: 'c = add(a, b)'
where a, b, and c are 1-D arrays.
It will teach us how to add new UFuncs, to coerce the arrays, to
pass the necessary information to a C iterator method and to do the
Step 2: continue enhancing the UFunc iterator and Array class
- implement some access methods for the Array class, print, repr,
getitem, setitem, etc.
- implement multidimensional arrays
- implement some of basic Array methods using UFuncs, e.g. +, -, etc.
- enable UFuncs to use Python sequences.
Step 3: complete the standard UFunc and Array class behavior
- implement getslice and setslice behavior
- work on Array broadcasting rules
- implement Record type
- implement reduce, reduceAt, and outer methods for UFuncs,
Step 4 is
- add more UFuncs
- implement buffer or mmap access
I've nearly completed Step 1. The one major change/enhancement to
that step is to immediately implement iteration over multi-D arrays
instead of 1-D arrays. Once this step is done, we will have working
code to test and analyze. Since this design is so modular,
particularly with respect to the array type modules, some work can be
done in parallel.
AstroPy mailing list - astropy at stsci.edu
More information about the AstroPy