[Numpy-discussion] Notes from meeting with Guido regarding inclusion of array package in Python core

konrad.hinsen at laposte.net konrad.hinsen at laposte.net
Fri Mar 11 02:34:14 CST 2005


On Mar 10, 2005, at 16:28, Perry Greenfield wrote:

> On March 7th Travis Oliphant and Perry Greenfield met Guido and Paul 
> Dubois to discuss some issues regarding the inclusion of an array 
> package within core Python.

A good initiative - and thanks for the report!

> So what about supporting arrays as an interchange format? There are a 
> number of possibilities to consider, none of which require inclusion 
> of arrays into the core. It is possible for 3rd party extensions to 
> optionally support arrays as an interchange format through one of the 
> following mechanisms:

True, but any of these options requires a much bigger effort than 
relying on a module in the standard library. Pointing out these methods 
is not exactly a way of encouraging people to use arrays as an 
interchange format, it's more a way of telling them that if they need a 
compact interchange format badly, there is a solution.

> a) So long as the extension package has access to the necessary array 
> include files, it can build the extension to use the arrays as a 
> format without actually having the array package installed. The 
> include files alone could be included into the core

True, but this implies nearly the same restrictions to evolution of the 
array code as having it in the core. The Numeric headers have changed 
frequently in the past.

> seem quite as receptive instead suggesting the next option) or could 
> be packaged with extension (we would prefer the former to reduce the 
> possibilities of many copies of include files). The extension could 
> then  be successfully compiled without

Having the header files in all client extensions is a sure recipe to 
block Numeric development. Any header change would imply non-acceptance 
by the end-user community.

If C were a language with implementation-independent interface 
descriptions, such approaches would be reasonable, but C is... well, C.

> b) One could modify the extension build process to see if the package 
> is installed and the include files are available, if so, it is built 
> with the support, otherwise not.

This is already possible today, and probably used by some extension 
modules. I use a similar test to build the netCDF interface selectively 
(if netCDF is available), and I can tell from experience that this 
causes quite some confusion for some users who install ScientificPython 
before netCDF (although the instructions point this out - but nobody 
seems to read instructions). But the main problem with this approach is 
that it doesn't work for pre-built binary distributions, i.e. in 
particular the Windows world.

> c) One could provide the support at the Python level by instead 
> relying on the use of buffer objects by the extension at the C level, 
> thus avoiding any dependence on the array C api. So long as the 
> extension has the ability to return buffer objects

That's certainly the cleanest solution, but it also requires a serious 
effort from the extension module writer: one more API to learn and use, 
and conversion between buffers and arrays in all modules that 
definitely need array functions.

> We talked at some length about whether it was possible to change 
> Python's numeric behavior for scalars, namely support for configurable 
> handling of numeric exceptions in the way numarray does it (and 
> Numeric3 as well). In short, not much was resolved. Guido didn't much 
> like the stack approach to the exception handling mode. His argument 
> (a reasonable one) was that even if the stack allowed pushing

I agree with Guido there. It looks like a hack.

> the decimal's use of context to see if it could used as a model. 
> Overall he seemed to think that setting mode on a module basis was a 
> better approach. Travis and I wondered about how that could be 
> implemented (it seems to imply that the exception handling needs to 
> know what module or namespace is being executed in order to determine 
> the mode.

That doesn't look simple. How about making error handling a 
characteristic of the type itself? That would double the number of 
float element types, but that doesn't seem a big deal to me. Handling 
the conversions and coercions is probably a bigger headache.

> So some more thought is needed regarding this. The difficulty of 
> proposing such changes and getting them accepted is likely to be 
> considerable. But Travis had a brilliant idea (some may see this as 
> evil but I think it has great merit). Nothing prevents a C extension 
> from hijacking the existing Python scalar objects behaviors.

True, and I like that idea a lot for testing and demonstrating 
concepts. Whether it's a good idea for production code is another 
question, and one to be discussed with Guido and the Python team in my 
opinion.

> Python at all (as such), no rank-0 arrays. This will be studied 
> further. One possible issue is that adding the necessary machinery to 
> make numeric scalar processing consistent with that of the array 
> package may introduce significant performance penalties (what is 
> negligible overhead for arrays may not be for scalars).

Adding a couple of methods should not cause any overhead at all. Where 
do you see the origin of the overhead?

> One last comment is that it is unlikely that any choice in this area 
> prevents the need for added helper functions to the array package to 
> assist in writing code that works well with scalars and arrays. There 
> are likely a number of such issues. A common

That remains to be seen. I must admit that I am personally a bit 
surprised by the importance this problem seems to have for many. I have 
a single spot on a single module that checks for scalar vs. array, 
which is negligible considering the amount of numerical code that I 
have.

>  approach is to wrap all unknown objects with "asarray". This works 
> reasonably well but doesn't handle the following case: If you wish to 
> write a function that will accept arrays or scalars, in principal it 
> would be nice to return scalars if all that was supplied were scalars. 
> So functions to help determine what the output type should

That happens automatically is you use asarray() only when you 
definitely need an array. I would expect this to be the case for list 
arguments rather than for scalar arguments.

Konrad.
--
---------------------------------------------------------------------
Konrad Hinsen
Laboratoire Léon Brillouin, CEA Saclay,
91191 Gif-sur-Yvette Cedex, France
Tel.: +33-1 69 08 79 25
Fax: +33-1 69 08 82 61
E-Mail: khinsen at cea.fr
---------------------------------------------------------------------





More information about the Numpy-discussion mailing list