[Numpy-discussion] Empty strings not empty?
Charles R Harris
charlesr.harris@gmail....
Wed Dec 30 13:21:15 CST 2009
On Wed, Dec 30, 2009 at 12:00 PM, Matthew Brett <matthew.brett@gmail.com>wrote:
> Hi.
>
> > It isn't empty:
> >
> > In [3]: array(['\x00']).dtype
> > Out[3]: dtype('|S1')
> >
> > In [4]: array(['\x00']).tostring()
> > Out[4]: '\x00'
> >
> > In [5]: array(['\x00'])[0]
> > Out[5]: ''
>
> No, but my problem was that an empty string is not empty either, and
> that you can't therefore distinguish between an empty string and a
> string with all 0 bytes:
>
> In [11]: np.array('') == '\x00\x00\x00'
> Out[11]: array(True, dtype=bool)
>
> > Looks like a printing problem to me, something in __repr__ for the string
> > array. It seems that trailing zeros are trimmed off.
> >
> > In [11]: array(['a\x00\x00'])
> > Out[11]:
> > array(['a'],
> > dtype='|S3')
> >
> > In [12]: array(['a\x00b'])
> > Out[12]:
> > array(['a\x00b'],
> > dtype='|S3')
>
> I don't think it's a printing problem, I think it's that the trailing
> zeros are pulled off in the string comparisons, and for printing, even
> though they are present in memory. I mean, that a.tostring() is
> right, and the __repr__ and comparisons are - at least to me -
> confusing.
>
> In [2]: a = np.array('a\x00\x00\x00')
>
> In [3]: a
> Out[3]:
> array('a',
> dtype='|S4')
>
> In [5]: a == 'a'
> Out[5]: array(True, dtype=bool)
>
> In [7]: a == 'a\x00\x00\x00'
> Out[7]: array(True, dtype=bool)
>
>
That is due to type promotion for the ufunc call:
In [17]: a1 = np.array('a\x00\x00\x00')
n [21]: np.array(['a'], dtype=a1.dtype)[0]
Out[21]: 'a'
In [22]: np.array(['a'], dtype=a1.dtype).tostring()
Out[22]: 'a\x00\x00\x00'
Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20091230/5a69cc47/attachment.html
More information about the NumPy-Discussion
mailing list