[Numpy-discussion] Re: Bytes Object and Metadata

Perry Greenfield perry at stsci.edu
Tue Mar 29 07:46:47 CST 2005


On Mar 28, 2005, at 6:25 PM, Travis Oliphant wrote:
>
> One could see it as a "flaw" in the buffer object, but I prefer to see 
> it as problesm with objects that use the PyBufferProcs protocol.  It 
> is at worst, a "limitation" of the buffer interface that should be 
> advertised (in my mind the problem lies with the objects that make use 
> of the buffer protocol and also reallocate memory willy-nilly since 
> Python does not allow for this).   To me, an analagous situation 
> occurs when an extension module writes into memory it does not own and 
> causes a seg-fault.  I suppose a casual observer could say this is a 
> Python flaw but clearly the problem is with the extension object.
>
> It certinaly does not mean at all that something like a buffer object 
> should never exist or that the buffer protocol should not be used.   I 
> get the feeling sometimes, that some naive (to Numeric and numarray) 
> people on python-dev feel that way.
>
Certainly there needs to be something like this (that's why we used it 
for numarray after all).

>>
>> I'm not sure how the support for large data sets should be handled. I 
>> generally think that it will be very awkward to handle these until 
>> Python does as well. Speaking of which...
>>
>> I had been in occasional contact with Martin von Loewis about his 
>> work to update Python to handle 64-bit addressing. We weren't 
>> planning to handle this in nummarray (nor Numeric3, right Travis or 
>> do I have that wrong?) until Python did. A few months ago Martin said 
>> he was mostly done. I had a chance to talk to him at Pycon about 
>> where that work stood. Unfortunately, it is not turning out to be as 
>> easy as he hoped. This is too bad. I have a feeling that this work is 
>> going to stall without help on our (numpy community) part to help 
>> make the changes or drum beating to make it a higher priority. At the 
>> moment the Numeric3 effort should be the most important focus, but I 
>> think that after that, this should become a high priority.
>>
>
> I would be interested to hear what the problems are.   Why can't you 
> just change the protocol replacing all int's with Py_intptr_t?   Is 
> backward compatibilty the problem? This seems like it's on the 
> extension code level (and then only on 64-bit systesm), and so would 
> be easier to force through the change in Python 2.5.
>
As Martin explained it, he said there is a lot of code that uses int 
declarations. If you are saying that it would be easy just to replace 
all int declarations in Python, I doubt it is that simple since there 
are calls to many other libraries that must use ints. So it means that 
there are thousands (so Martin says) of declarations that one must 
change by hand. It has to be changed for strings, lists, tuples and 
everything that uses them (Guido was open to doing this but everything 
had to be updated at once, not just strings or certain objects, and he 
is certainly right about that). Martin also said that we would need a 
system with enough memory to test all of these. Lists in particular 
would need a system with 16GB of memory to test lists that use more 
than the current limit (because of the size of list objects). I'm not 
sure I agree with that. It would be nice to have that kind of test, but 
I think it would be reasonable to have tested on the largest memory 
systems available at the time for our testing. If there are latent list 
sequence bugs that surface when 16 GB systems become available, then 
the bugs can be dealt with at that time (IMHO). (Anybody out there have 
a system with that much memory available for test purposes :-).

Of course, this change will change the C API for Python too as far as 
sequence use goes (or is there some way around that? A compatibility 
API and a new one that supports extended indices?)  It would be nice if 
there were some way of handling that gracefully without requiring all 
extensions to have to change to match this. I imagine that this is 
going to be the biggest objection to making any changes unless the old 
API is supported for a while. Perhaps someone has thought this all out 
already. I haven't thought about it at all.

Perry





More information about the Numpy-discussion mailing list