[Numpy-discussion] Incorrect removal of NULL char in buffers

Travis Oliphant oliphant.travis at ieee.org
Fri Sep 29 13:19:27 CDT 2006

Francesc Altet wrote:
> Hi,
> However, for string values, numpy seems to work in a strange way. 
> The numarray have an expected behaviour, IMO:
> In [100]: numarray.strings.array(buffer="a\x00b"*4, itemsize=4, shape=3)
> Out[100]: CharArray(['a', '', 'ba'])  
I'm not sure why you think this is "expected."   You have 
non-terminating NULLs in this array and yet they are not printing for you.

Just look at the tostring()...

> but numpy  haven't:
> In [101]: numpy.ndarray(buffer="a\x00b"*4, dtype="S4", shape=3)
> Out[101]:
> array([aba, ba, bab],
>       dtype='|S4')
> i.e. it seems like numpy is striping-off NULL chars before building the object 
> and I don't think this is correct.

Hmmm.  I don't see that at all.  This is what I get (version of numpy is 

In [33]: numpy.ndarray(buffer="a\x00b"*4, dtype="S4", shape=3)
array(['a\x00ba', '\x00ba', 'ba\x00b'],

which to me is very much expected.   I.e. only terminating NULLs are 
stripped off of the strings on printing.   I think you are getting 
different results because string printing used to not include the quotes 
(which had the side-effect of not printing NULLs in the middle of 
strings).  They are still there, just not showing up in your output.

In the end both numarray and numpy have the same data stored 
internally.  It's just a matter of how it is being printed that seems to 
differ a bit.  From my perspective, only NULLs at the end of strings 
should be stripped off and that is the (current) behavior of NumPy.

You are getting different results, because the array-printing for 
strings was recently updated (to insert the quotes so that it makes more 
sense).    Without these changes, I think the NULLs were being stripped 
away on printing.  In other words, something like

print 'a\x00ba'


used to be happening. 


More information about the Numpy-discussion mailing list