[Numpy-svn] r8148 - trunk/doc

numpy-svn@scip... numpy-svn@scip...
Sat Feb 20 12:08:28 CST 2010

Author: ptvirtan
Date: 2010-02-20 12:08:27 -0600 (Sat, 20 Feb 2010)
New Revision: 8148

3K: doc: update Py3K port documentation

Modified: trunk/doc/Py3K.txt
--- trunk/doc/Py3K.txt	2010-02-20 18:08:14 UTC (rev 8147)
+++ trunk/doc/Py3K.txt	2010-02-20 18:08:27 UTC (rev 8148)
@@ -59,6 +59,15 @@
 * Only unicode dtype field titles are included in fields dict.
+* :pep:`3118` buffer objects will behave differently from Py2 buffer objects
+  when used as an argument to `array(...)`, `asarray(...)`.
+  In Py2, they would cast to an object array.
+  In Py3, they cast similarly as objects having an
+  ``__array_interface__`` attribute, ie., they behave as if they were
+  an ndarray view on the data.
 .. todo::
    Check for any other changes ... This we want in the end to include
@@ -317,8 +326,8 @@
    Py_TPFLAGS_HAVE_CLASS in the type flag.
+PyBuffer (provider)
 PyBuffer usage is widely spread in multiarray:
@@ -335,33 +344,13 @@
 for generic array scalars. The generic array scalar exporter, however,
 doesn't currently produce format strings, which needs to be fixed.
-Currently, the format string and some of the memory is cached in the
-PyArrayObject structure. This is partly needed because of Python bug #7433.
 Also some code also stops working when ``bf_releasebuffer`` is
 defined.  Most importantly, ``PyArg_ParseTuple("s#", ...)`` refuses to
 return a buffer if ``bf_releasebuffer`` is present.  For this reason,
 the buffer interface for arrays is implemented currently *without*
 defining ``bf_releasebuffer`` at all. This forces us to go through
-some additional contortions. But basically, since the strides and shape
-of an array are locked when references to it are held, we can do with
-a single allocated ``Py_ssize_t`` shape+strides buffer.
+some additional work.
-The buffer format string is currently cached in the ``dtype`` object.
-Currently, there's a slight problem as dtypes are not immutable --
-the names of the fields can be changed. Right now, this issue is
-just ignored, and the field names in the buffer format string are
-not updated.
-From the consumer side, the new buffer protocol is mostly backward
-compatible with the old one, so little needs to be done here to retain
-basic functionality. However, we *do* want to make use of the new
-features, at least in `multiarray.frombuffer` and maybe in `multiarray.array`.
-Since there is a native buffer object in Py3, the `memoryview`, the
-`newbuffer` and `getbuffer` functions are removed from `multiarray` in
-Py3: their functionality is taken over by the new `memoryview` object.
 There are a couple of places that need further attention:
 - VOID_getitem
@@ -401,7 +390,10 @@
+.. todo::
+   Produce PEP 3118 format strings for array scalar objects.
 .. todo::
    Is there a cleaner way out of the ``bf_releasebuffer`` issue?  It
@@ -411,50 +403,90 @@
    It seems we should submit patches to Python on this. At least "s#"
    implementation on Py3 won't work at all, since the old buffer
-   interface is no more present.
+   interface is no more present. But perhaps Py3 users should just give
+   up using "s#" in ParseTuple, and use the 3118 interface instead.
 .. todo::
-   Find a way around the dtype mutability issue.
+   Make ndarray shape and strides natively Py_ssize_t?
-   Note that we cannot just realloc the format string when the names
-   are changed: this would invalidate any existing buffer
-   interfaces. And since we can't define ``bf_releasebuffer``, we
-   don't know if there are any buffer interfaces present.
-   One solution would be to alloc a "big enough" buffer at the
-   beginning, and not change it after that. We could also make the
-   strides etc.  in the ``buffer_info`` structure static size. There's
-   MAXDIMS present after all.
+PyBuffer (consumer)
-.. todo::
+There are two places in which we may want to be able to consume buffer
+objects and cast them to ndarrays:
-   Take a second look at places that used PyBuffer_FromMemory and 
-   PyBuffer_FromReadWriteMemory -- what can be done with these?
+1) `multiarray.frombuffer`, ie., ``PyArray_FromAny``
-.. todo::
+   The frombuffer returns only arrays of a fixed dtype.  It does not
+   make sense to support PEP 3118 at this location, since not much
+   would be gained from that -- the backward compatibility functions
+   using the old array interface still work.
-   Implement support for consuming new buffer objects.
-   Probably in multiarray.frombuffer? Perhaps also in multiarray.array?
+   So no changes needed here.
-.. todo::
+2) `multiarray.array`, ie., ``PyArray_FromAny``
-   make ndarray shape and strides natively Py_ssize_t
+   In general, we would like to handle :pep:`3118` buffers in the same way
+   as ``__array_interface__`` objects. Hence, we want to be able to cast
+   them to arrays already in ``PyArray_FromAny``.
+   Hence, ``PyArray_FromAny`` needs additions.
+There are a few caveats in allowing :pep:`3118` buffers in
+a) `bytes` (and `str` on Py2) objects offer a buffer interface that
+   specifies them as 1-D array of bytes.
+   Previously ``PyArray_FromAny`` has cast these to 'S#' dtypes. We
+   don't want to change this, since will cause problems in many places.
+   We do, however, want to allow other objects that provide 1-D byte arrays
+   to be cast to 1-D ndarrays and not 'S#' arrays -- for instance, 'S#'
+   arrays tend to strip trailing NUL characters.
+So what is done in ``PyArray_FromAny`` currently is that:
+- Presence of :pep:`3118` buffer interface is checked before checking
+  for array interface. If it is present *and* the object is not
+  `bytes` object, then it is used for creating a view on the buffer.
+- We also check in ``discover_depth`` and ``_array_find_type`` for the
+  3118 buffers, so that::
+      array([some_3118_object])
+  will treat the object similarly as it would handle an `ndarray`.
+  However, again, bytes (and unicode) have priority and will not be
+  handled as buffer objects.
+This amounts to possible semantic changes:
+- ``array(buffer)`` will no longer create an object array 
+  ``array([buffer], dtype='O')``, but will instead expand to a view
+  on the buffer.
 .. todo::
-   Revise the decision on where to cache the format string -- dtype
-   would be a better place for this.
+   Take a second look at places that used PyBuffer_FromMemory and 
+   PyBuffer_FromReadWriteMemory -- what can be done with these?
 .. todo::
    There's some buffer code in numarray/_capi.c that needs to be addressed.
-.. todo::
-   Does altering the PyArrayObject structure require bumping the ABI?
+PyBuffer (object)
+Since there is a native buffer object in Py3, the `memoryview`, the
+`newbuffer` and `getbuffer` functions are removed from `multiarray` in
+Py3: their functionality is taken over by the new `memoryview` object.

More information about the Numpy-svn mailing list