[Numpy-discussion] Incorrect removal of NULL char in buffers
Travis Oliphant
oliphant.travis at ieee.org
Fri Sep 29 13:19:27 CDT 2006
Francesc Altet wrote:
> Hi,
>
>
> However, for string values, numpy seems to work in a strange way.
> The numarray have an expected behaviour, IMO:
>
> In [100]: numarray.strings.array(buffer="a\x00b"*4, itemsize=4, shape=3)
> Out[100]: CharArray(['a', '', 'ba'])
>
>
I'm not sure why you think this is "expected." You have
non-terminating NULLs in this array and yet they are not printing for you.
Just look at the tostring()...
> but numpy haven't:
>
> In [101]: numpy.ndarray(buffer="a\x00b"*4, dtype="S4", shape=3)
> Out[101]:
> array([aba, ba, bab],
> dtype='|S4')
>
> i.e. it seems like numpy is striping-off NULL chars before building the object
> and I don't think this is correct.
>
Hmmm. I don't see that at all. This is what I get (version of numpy is
1.0.dev3233)
In [33]: numpy.ndarray(buffer="a\x00b"*4, dtype="S4", shape=3)
Out[33]:
array(['a\x00ba', '\x00ba', 'ba\x00b'],
dtype='|S4')
which to me is very much expected. I.e. only terminating NULLs are
stripped off of the strings on printing. I think you are getting
different results because string printing used to not include the quotes
(which had the side-effect of not printing NULLs in the middle of
strings). They are still there, just not showing up in your output.
In the end both numarray and numpy have the same data stored
internally. It's just a matter of how it is being printed that seems to
differ a bit. From my perspective, only NULLs at the end of strings
should be stripped off and that is the (current) behavior of NumPy.
You are getting different results, because the array-printing for
strings was recently updated (to insert the quotes so that it makes more
sense). Without these changes, I think the NULLs were being stripped
away on printing. In other words, something like
print 'a\x00ba'
aba
used to be happening.
-Travis
More information about the Numpy-discussion
mailing list