Tue Jul 27 13:34:17 CDT 2010
On Tue, Jul 27, 2010 at 11:14 AM, Brian Granger <email@example.com> wrote:
> Yes, I hadn't though about the fact that unicode objects are buffers as
> well. But, we could raise a TypeError when a user tries to send a unicode
> object (str in python 3). IOW, don't treat unicode as buffers and force
> them to encode/de ode. Does this make sense or should we allow unicode to
> be sent as buffers.
Well, the problem I explained about a possible mismatch in internal
unicode storage format rears its ugly head if we allow
unicode-as-buffer. I was precisely worried about sending 3.x strings
as buffers, since the two ends may not agree on what the buffer means.
I may be worrying about a non-problem, but at some point it might be
worth veryfing this. The test is a bit cumbersome to set up, because
you have to build two versions of Python, one with ucs-2 and one with
ucs-4, and see what happens if they try to send each other stuff. But
I think it's a test worth making, so we know for sure whether this is
a problem or not, as it will dictate design decisions for 3.x on all
If it is a problem, then there are some options:
- disallow communication between ucs 2/4 pythons.
- detect a mismatch and encode/decode all unicode strings to utf-8 on
send/receive, but allow raw buffer sending if there's no mismatch.
- *always* encode/decode.
The middle option seems appealing because it avoids the overhead of
encoding/decoding on all sends, but I'm worried it may be too brittle.
More information about the IPython-dev