[Numpy-discussion] Bytes vs. Unicode in Python3
Dag Sverre Seljebotn
Thu Dec 3 07:03:13 CST 2009
Pauli Virtanen wrote:
> Fri, 27 Nov 2009 23:19:58 +0100, Dag Sverre Seljebotn wrote:
>> One thing to keep in mind here is that PEP 3118 actually defines a
>> standard dtype format string, which is (mostly) incompatible with
>> NumPy's. It should probably be supported as well when PEP 3118 is
> PEP 3118 is for the most part implemented in my Py3K branch now -- it was
> not actually much work, as I could steal most of the format string
> converter from numpy.pxd.
Great! Are you storing the format string in the dtype types as well? (So
that no release is needed and acquisitions are cheap...)
As far as numpy.pxd goes -- well, for the simplest dtypes.
> Some questions:
> How hard do we want to try supplying a buffer? Eg. if the consumer does
> not specify strided but specifies suboffsets, should we try to compute
> suitable suboffsets? Should we try making contiguous copies of the data
> (I guess this would break buffer semantics?)?
Actually per the PEP, suboffsets imply strided:
#define PyBUF_INDIRECT (0x0100 | PyBUF_STRIDES)
:-) So there's no real way for a consumer to specify only suboffsets,
0x0100 is not a possible flag I think. Suboffsets can't really work
without the strides anyway IIUC, and in the case of NumPy the field can
always be left at 0.
IMO one should very much stay clear of making contiguous copies,
especially considering the existance of PyBuffer_ToContiguous, which
makes it trivial for client code to get a pointer to a contiguous buffer
anyway. The intention of the PEP seems to be to export the buffer in as
raw form as possible.
Do keep in mind that IS_C_CONTIGUOUS and IS_F_CONTIGUOUS go be too
conservative with NumPy arrays. If a contiguous buffer is requested,
then looping through the strides and checking that the strides are
monotonically decreasing/increasing could eventually save copying in
some cases. I think that could be worth it -- I actually have my own
code for IS_F_CONTIGUOUS rather than relying on the flags personally
because of this issue, so it does come up in practice.
More information about the NumPy-Discussion