Tue Jul 27 14:23:37 CDT 2010
On Tue, Jul 27, 2010 at 11:34 AM, Fernando Perez <firstname.lastname@example.org>wrote:
> On Tue, Jul 27, 2010 at 11:14 AM, Brian Granger <email@example.com>
> > Yes, I hadn't though about the fact that unicode objects are buffers as
> > well. But, we could raise a TypeError when a user tries to send a
> > object (str in python 3). IOW, don't treat unicode as buffers and force
> > them to encode/de ode. Does this make sense or should we allow unicode
> > be sent as buffers.
> Well, the problem I explained about a possible mismatch in internal
> unicode storage format rears its ugly head if we allow
> unicode-as-buffer. I was precisely worried about sending 3.x strings
> as buffers, since the two ends may not agree on what the buffer means.
> I may be worrying about a non-problem, but at some point it might be
> worth veryfing this. The test is a bit cumbersome to set up, because
> you have to build two versions of Python, one with ucs-2 and one with
> ucs-4, and see what happens if they try to send each other stuff. But
> I think it's a test worth making, so we know for sure whether this is
> a problem or not, as it will dictate design decisions for 3.x on all
> string handling.
This is definitely an issue. Also, someone could set their own custom
unicode encoding by hand and that would mess this up as well.
> If it is a problem, then there are some options:
> - disallow communication between ucs 2/4 pythons.
But this doesn't account for other encoding/decoding setups.
> - detect a mismatch and encode/decode all unicode strings to utf-8 on
> send/receive, but allow raw buffer sending if there's no mismatch.
This will be tough though if users set their own encoding.
> - *always* encode/decode.
I think this is the option that I prefer (having users to this in their
> The middle option seems appealing because it avoids the overhead of
> encoding/decoding on all sends, but I'm worried it may be too brittle.
Brian E. Granger, Ph.D.
Assistant Professor of Physics
Cal Poly State University, San Luis Obispo
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the IPython-dev