[Numpy-svn] r3584 - trunk/numpy/doc

numpy-svn@scip... numpy-svn@scip...
Tue Mar 20 05:05:45 CDT 2007


Author: oliphant
Date: 2007-03-20 05:05:43 -0500 (Tue, 20 Mar 2007)
New Revision: 3584

Modified:
   trunk/numpy/doc/pep_buffer.txt
Log:
Update buffer interface PEP.

Modified: trunk/numpy/doc/pep_buffer.txt
===================================================================
--- trunk/numpy/doc/pep_buffer.txt	2007-03-19 19:15:23 UTC (rev 3583)
+++ trunk/numpy/doc/pep_buffer.txt	2007-03-20 10:05:43 UTC (rev 3584)
@@ -24,7 +24,7 @@
 
    The buffer protocol allows different Python types to exchange a
    pointer to a sequence of internal buffers.  This functionality is
-   '''extremely''' useful for sharing large segments of memory between
+   *extremely* useful for sharing large segments of memory between
    different high-level objects, but it's too limited and has issues.
 
     1. There is the little (never?) used "sequence-of-segments" option
@@ -36,36 +36,37 @@
     3. There is no way for a consumer to tell the buffer-API-exporting
        object it is "finished" with its view of the memory and
        therefore no way for the exporting object to be sure that it is
-       safe to reallocate the pointer to the memory that it owns (the
-       array object reallocating its memory after sharing it with the
-       buffer object which held the original pointer led to the
-       infamous buffer-object problem).
+       safe to reallocate the pointer to the memory that it owns (for
+       example, the array object reallocating its memory after sharing
+       it with the buffer object which held the original pointer led
+       to the infamous buffer-object problem).
 
     4. Memory is just a pointer with a length. There is no way to
-       describe what's "in" the memory (float, int, C-structure, etc.)
+       describe what is "in" the memory (float, int, C-structure, etc.)
 
     5. There is no shape information provided for the memory.  But,
        several array-like Python types could make use of a standard
        way to describe the shape-interpretation of the memory
-       (!wxPython, GTK, pyQT, CVXOPT, !PyVox, Audio and Video
-       Libraries, ctypes, !NumPy, data-base interfaces, etc.)
+       (wxPython, GTK, pyQT, CVXOPT, PyVox, Audio and Video
+       Libraries, ctypes, NumPy, data-base interfaces, etc.)
 
     There are two widely used libraries that use the concept of
     discontiguous memory: PIL and NumPy.  Their view of discontiguous
     arrays is a bit different, though.  NumPy uses the notion of
-    constant striding in each dimension as it's basic concept of an
+    constant striding in each dimension as its basic concept of an
     array. In this way a simple sub-region of a larger array can be
-    described without copying the data.  Strided memory is a common
-    way to describe data to many computing libraries (such as the BLAS
+    described without copying the data.  Strided memory is also a common
+    way to describe data in many computing libraries (such as the BLAS
     and LAPACK).
 
     The PIL uses a more opaque memory representation. Sometimes an
     image is contained in a contiguous segment of memory, but
     sometimes it is contained in an array of pointers to the
     contiguous segments (usually lines) of the image.  This allows the
-    image to not be loaded entirely into memory.  The PIL is where the
-    idea of multiple buffer segments in the original buffer interface
-    came from, I believe.
+    image to not be loaded entirely into memory but still managed
+    abstractly as if it were. I believe, the PIL is where the idea of
+    multiple buffer segments in the original buffer interface came
+    from, I believe.
 
     The buffer interface should allow discontiguous memory areas to
     share standard striding information.  However, consumers that do
@@ -80,18 +81,21 @@
 
    * Unify the read/write versions of getting the buffer.
 
-   * Add a new function to the protocol that should be called when
+   * Add a new function to the interface that should be called when
      the consumer object is "done" with the view.
 
-   * Add a new function to allow the protocol to describe what is in
+   * Add a new memory_view object that is returned from the 
+     buffer interface getbuffer call.  This memory_view object
+     contains
+   * Add a new function to allow the interface to describe what is in
      memory (unifying what is currently done now in struct and
      array)
 
-   * Add a new function to allow the protocol to share shape
-     information
+   * Add a new function to allow the protocol to share shape and 
+     stride information
 
-   * Fix all objects in core and standard library to conform to the
-     new interface
+   * Fix all objects in the core and the standard library to conform
+     to the new interface
 
    * Extend the struct module to handle more format specifiers
 
@@ -102,89 +106,74 @@
     typedef struct {
          getbufferproc bf_getbuffer
          releasebufferproc bf_releasebuffer
-         formatbufferproc bf_getbufferformat
-         shapebufferproc bf_getbuffershape 
     }
 
     typedef PyObject *(*getbufferproc)(PyObject *obj, void **buf,
-                                       Py_ssize_t *len, int requires)
+                                       Py_ssize_t *len, int *writeable,
+                                       char **format, int *ndims,
+                                       Py_ssize_t **shape,
+                                       Py_ssize_t **strides)
  
       Return a pointer to memory in *buf and the length of that memory
-      buffer in *len.  Requirements for the memory are provided in
-      requires (PYBUFFER_WRITE, PYBUFFER_ONESEGMENT).  NULL is
-      returned and an error raised if the object cannot return a view
-      with those requirements.  Otherwise, an object-specific "view"
-      object is returned (which can just be as simple as a borrowed 
-      reference to obj).
+      buffer (in bytes) in *len.  The next arguments are optional.  
+      NULL is returned on failure.   On success an oject-specific 
+      view is returned (which may just be a borrowed reference to obj).
+      This view should be passed to bf_releasebuffer when the consumer
+      is done with the view. 
 
-      This view object should be used in the other API calls and 
+      writeable -- address of an integer variable to hold 
+                     whether or not the memory is writeable.
+                     If this is NULL, then you must assume the memory 
+                     is read-only.
+      format    -- address of a format-string (following extended struct 
+                     syntax) indicating what is in each element of
+                     of memory.  The number of elements is len / itemsize,
+                     where itemsize is the number of bytes implied by the format.
+                     NULL if not needed in which case format == "B" for 
+                     unsigned bytes.  The memory for this string must not
+                     be freed by the consumer --- it is managed by the exporter.
+      ndims     -- address of a variable storing the number of dimensions 
+                     or NULL if not needed.  If shape and/or strides are given
+                     then this must be non NULL.  If this variable is 
+                     not provided then it is assumed that *ndims == 1
+      shape     -- address of a Py_ssize_t* variable that will be filled
+                     with a pointer to an array of Py_ssize_t of length *ndims
+                     indicating the shape of the memory as an N-D array.  
+                     Ignored if this is NULL.  Note that
+                     ((*shape)[0] * ... * (*shape)[ndims-1])*itemsize = len
+                     If this variable is not provided then it is assumed that
+                     (*shape[0]) == len / itemsize. 
+      stride    -- address of a Py_ssize_t* variable that will be filled
+                     with a pointer to an array of Py_ssize_t of length *ndims
+                     indicating the number of bytes to skip to get to the next
+                     element in each dimension.  If this is NULL, then
+                     the memory is assumed to be C-style contigous with
+                     the last dimension varying the fastest.  An  
+                     error should be raised if this is not accurate and
+                     strides are not requested.  This variable may be
+                     set to NULL when called if memory is C-style
+                     contiguous. 
+                
+      This view object should be used in the other API call and 
       does not need to be decref'd.  It should be "released" if the
       interface exporter provides the bf_releasebuffer function.
 
     typedef int (*releasebufferproc)(PyObject *view)
 
-      This function is called when a view of memory previously
-      acquired from the object is no longer needed.  It is up to the
-      exporter of the API to make sure all views have been released
-      before re-allocating the previously returned pointer.
-      It is up to consumers of the API to call this function on the
-      object whose view is obtained when it is no longer needed.  A -1
-      is returned on error and 0 on success.
+      This function is called (if defined by the exporting object)
+      when a view of memory previously acquired from the object is no
+      longer needed.  It is up to the exporter of the API to make sure
+      all views have been released before re-allocating any previously
+      shared memory.  It is up to consumers of the API to call this
+      function on the object whose view is obtained when it is no
+      longer needed.   Any format string, shape array or strides array
+      returned through the interface should also not be referenced after 
+      the releasebuffer call is made. 
+      A -1 is returned on error and 0 on success.
 
-    typedef PyObject *(*formatbufferproc)(PyObject *view, int *itemsize)
+    Both of these routines are optional for a type object 
 
-      Get the format-string of the memory using the struct-module
-      string syntax (see below for proposed additions to that syntax).
-      Also, there is never an alignment assumption in this
-      string---the full byte-layout is always required.  If the
-      implied size of this string is smaller than the length of the
-      buffer then it is assumed that the string is repeated.
 
-      If itemsize is not NULL, then return the size implied by the
-      format string.  This could be the entire length of the buffer or
-      just the length of each element.  It is equivalent to *itemsize
-      = PyObject_SizeFromFormat(ret) if ret is the returned string.
-      However, very often objects already know the itemsize without
-      having to compute it separately.
-
-      The returned object is a Python CObject surrounding a char *
-      pointer which will manage the memory for the char * when the
-      reference disappears.
-
-      If this is routine is not provided, then it is the same as if
-      "B" were the returned string (i.e. it's just a block of bytes)
-      and itemsize==1.
-
-
-    typedef PyObject *(*shapebufferproc)(PyObject *view)
-
-      Return a Python CObject surrounding a pointer to the structure
-
-      struct {
-         int ndim
-         Py_ssize_t *shape;
-         Py_ssize_t *strides; 
-      }
-
-      The strides pointer can be NULL if the memory is C-style contiguous
-      otherwise it provides the striding in each dimension (how many bytes
-      to skip to get to the next element along a particular dimension).
-
-      When the returned object is collected, the memory for the shape
-      and strides is freed by the deallocator stored in the CObject.
-
-      If this routine is not provided, then it's equivalent to
-      ndim == 1 and shape == [len]
-
-      Notice that the buffer length, len, should be
-          (shape[0]*...*shape[ndim-1])*itemsize regardless of the strides.
-
-
-    All of these routines are optional for a type object (but the last
-    three make no sense unless the first one is implemented).
-
-
-
 New C-API calls are proposed
 
    int 
@@ -194,65 +183,29 @@
 
    PyObject * 
    PyObject_GetBuffer(PyObject *obj, void **buf, Py_ssize_t *len,
-                      int requires)
+                      int *writeable, char **format, int *ndims,
+                      Py_ssize_t **shape, Py_ssize_t **strides)
 
-      return a borrowed reference to a "view" object of memory for the
-      object.  Requirements for the memory should be given in requires
-      (PYBUFFER_WRITE, PYBUFFER_ONESEGMENT).  The memory pointer is in
-      *buf and its length in *len. 
-
-      Note, the memory is not considered a single segment of memory 
-      unless PYBUFFER_ONESEGMENT is used in requires. Get possible
-      striding using PyObject_GetBufferShape on the view object. 
+      Get the buffer and optional information variables about the buffer.
+      Return an object-specific view object (which may be simply a 
+      borrowed reference to the object itself). 
       
    int
    PyObject_ReleaseBuffer(PyObject *view)
       
       call this function to tell obj that you are done with your "view"
-      This is a no-op if the object doesn't implement a release function.
-      Only call this after a previous PyObject_GetBuffer has succeeded. 
-      Return -1 on error. 
+      This doesn't do anything if the object doesn't implement a release function.
+      Only call this after a previous PyObject_GetBuffer has succeeded and when
+      you will not be needing or referring to the memory (or the format, shape, 
+      and strides memory used in the view -- if you will use these for a longer
+      period of time make copies). 
+      Returns -1 on error. 
       
-   char *
-   PyObject_GetBufferFormat(PyObject *view, int *itemsize)
-
-      Return a NULL-terminated string indicating the data-format of
-      the memory buffer.  The string is in struct-module syntax with
-      the exception that there is never an alignment assumption (all
-      bytes must be accounted for). If the length of the buffer
-      indicated by this string is smaller than the total length of the
-      buffer, then a repeat of the string is implied to fill the
-      length of the buffer.
-
-      If itemsize is not NULL, then return the implied size
-      of each item (this could be calculated from the format string
-      but it is often known by the view object anyway). 
-
-   PyObject *
-   PyObject_GetBufferShape(PyObject *view)
-
-      Return a 2-tuple of lists (shape, stride) providing the
-      multi-dimensional shape of the memory area.  The stride
-      shows how many bytes to skip in each dimension to move
-      in that dimension from the start of the array. 
-
-      Memory that is not a single contiguous-buffer can be represented
-      with the pointer returned from GetBuffer and the shape and
-      strides returned from GetBufferShape.
-
    int PyObject_SizeFromFormat(char *)
-      Return the implied size of the data-format area from a struct-style
-      description.
+      Return the implied itemsize of the data-format area from a struct-style
+      description. 
 
-   PyObject *PyObject_BufferFormat(char *format, int copy)
-      Construct a CObject to return as the format in the buffer interface
-      from a string being sure to copy if specified.
 
-   PyObject *PyObject_BufferShape(int ndim, Py_ssize_t *shape, Py_ssize_t *strides)
-      Construct a CObject to return as the shape object in the buffer interface.
-      The values are copied from the arrays pointed to by shape and strides.
-      Strides can be NULL if the memory is C-style contiguous. 
-
 Additions to the struct string-syntax
 
    The struct string-syntax is missing some characters to fully
@@ -275,7 +228,7 @@
    '&'               specific pointer (prefix before another charater) 
    'X{}'             pointer to a function (optional function 
                                              signature inside {})
-   ' '               ignored (allow readability)
+   ' '               ignored (allow better readability)
 
    The struct module will be changed to understand these as well and
    return appropriate Python objects on unpacking.  Un-packing a
@@ -343,36 +296,36 @@
 
    anything else using the buffer API
 
+
+
 Issues and Details
 
+
    The proposed locking mechanism relies entirely on the objects
    implementing the buffer interface to do their own thing.  Ideally
    an object that implements the buffer interface should keep at least
    a number indicating how many releases are extant.  If there are views
-   to a memory location, then reallocation should fail and raise
+   to a memory location, then any subsequent reallocation should fail and raise
    an error. 
 
-   The handling of discontiguous memory is new and can be seen as a
+   The sharing of strided memory is new and can be seen as a
    modification of the multiple-segment interface.  It is motivated by
-   NumPy (used to be Numeric).  NumPy objects should be able to share
-   their strided memory with code that understands how to manage
-   strided memory.
+   NumPy.  NumPy objects should be able to share their strided memory
+   with code that understands how to manage strided memory because
+   strided memory is very common when interfacing with compute libraries.
 
-   Code should also be able to request contiguous memory if needed and
-   objects exporting the buffer interface should be able to handle
-   that either by raising an error (or constructing a read-only
-   contiguous object and returning that as the view).
-
    Currently the struct module does not allow specification of nested
    structures.  It seems like specifying a nested structure should be
-   specified as several ways of viewing memory areas (ctypes and
+   specified as several ways of viewing memory areas (e.g. ctypes and
    NumPy) already allow this.
 
-   Python Objects are returned for Format and Shape descriptions so 
-   that memory-management is simply handled using reference-counting. 
+   Memory management of the format string and the shape and strides
+   array is always the responsibility of the exporting object and can
+   be shared between different views. If the consuming object needs to
+   keep these memory areas longer than the view is held, then it must
+   copy them to its own memory. 
 
 
-
 Copyright
 
    This PEP is placed in the public domain



More information about the Numpy-svn mailing list