[Numpy-discussion] Bug in rec.fromarrays ; plus one other possible bug
Dan Yamins
dyamins@gmail....
Wed Nov 25 08:48:50 CST 2009
Hi, I'm writing to report what looks like a two bugs in the handling of
strings of length 0. (I'm using 1.4.0.dev7746, on Mac OSX 10.5.8. The
problems below occur both for python 2.5 compiled 32-bit as well as
python2.6 compiled 64-bit).
Bug #1:
A problem arises when you try to create a record array passing a type of
'|S0'.
>>> Cols = [['test']*10,['']*10]
When not passing any dtype, this is created into a recarray with no problem:
>>> np.rec.fromarrays(Cols)
rec.array([('test', ''), ('test', ''), ('test', ''), ('test', ''),
('test', ''), ('test', ''), ('test', ''), ('test', ''),
('test', ''), ('test', '')],
dtype=[('f0', '|S4'), ('f1', '|S1')])
However, trouble arises when I try to pass a length-0 dtype explicitly.
>>> d = np.dtype([('A', '|S4'), ('B', '|S')])
>>> np.rec.fromarrays(Cols,dtype=d)
rec.array([('test', ''), ('\x00est', ''), ('\x00est', ''), ('\x00est', ''),
('\x00est', ''), ('\x00est', ''), ('\x00est', ''), ('\x00est', ''),
('\x00est', ''), ('\x00est', '')],
dtype=[('A', '|S4'), ('B', '|S0')])
The same thing occurs if I cast to np arrays before passing to
np.rec.fromarrays:
>>> _arrays = [np.array(Cols[0],'|S4'),np.array(Cols[1],'|S')]
[array(['test', 'test', 'test', 'test', 'test', 'test', 'test', 'test',
'test', 'test'],
dtype='|S4'),
array(['', '', '', '', '', '', '', '', '', ''],
dtype='|S1')]
>>> np.rec.fromarrays(_arrays,dtype=d)
rec.array([('test', ''), ('\x00est', ''), ('\x00est', ''), ('\x00est', ''),
('\x00est', ''), ('\x00est', ''), ('\x00est', ''), ('\x00est', ''),
('\x00est', ''), ('\x00est', '')],
dtype=[('A', '|S4'), ('B', '|S0')])
(Btw, why does np.array(['',''],'|S')) return an array with dtype '|S1'?
Why not '|S0'? Are length-0 arrays being avoided explicitly? If so, why?)
Bug #2: I'm not sure this is a bug, but it is annoying: np.dtype won't
accept '|S0' as a type argument.
>>> np.dtype('|S0')
TypeError: data type not understood
I have to do something like this:
>>> d = np.dtype('|S')
>>> d
dtype('|S0')
to get what I want. Is this intended? Regardless, this inconsistency also
means that things like:
>>> np.dtype(d.descr)
can fail even when d is a properly constructed dtype object with a '|S0'
type, which seems a little perverse.
Am I just not supposed to be working with length-0 string columns, period?
Thanks,
Dan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20091125/61649ad8/attachment.html
More information about the NumPy-Discussion
mailing list