[Numpy-discussion] loadtxt() behavior on single-line files

Benjamin Root ben.root@ou....
Thu Jun 24 11:15:21 CDT 2010


Hi,

I was having the hardest time trying to figure out an intermittent bug in
one of my programs.  Essentially, in some situations, it was throwing an
error saying that the array object was not an array.  It took me a while,
but then I figured out that my program was assuming that the object returned
from a loadtxt() call was always a structured array (I was using dtypes).
However, if the data file being loaded only had one data record, then all
you get back is a structured record.

import numpy as np
from StringIO import StringIO

strData = StringIO("89.23 47.2\n13.2 42.2")
a = np.loadtxt(strData, dtype=[('x', float), ('y', float)])
print "Length Two"
print a
print a.shape
print len(a)

strData = StringIO("53.2 49.2")
a = np.loadtxt(strData, dtype=[('x', float), ('y', float)])
print "\n\nLength One"
print a
print a.shape
try :
    print len(a)
except TypeError as err
    print "ERROR:", err

Which gets me this output:

Length Two
[(89.230000000000004, 47.200000000000003)
 (13.199999999999999, 42.200000000000003)]
(2,)
2


Length One
(53.200000000000003, 49.200000000000003)
()
ERROR: len() of unsized object


Note that this isn't restricted to structured arrays.  For regular ndarrays,
loadtxt() appears to mimic the behavior of np.squeeze():

>>> a = np.ones((1, 1, 1))
>>> np.squeeze(a)[0]
IndexError: 0-d arrays can't be indexed

>>> strData = StringIO("53.2")
>>> a = np.loadtxt(strData)
>>> a[0]
IndexError: 0-d arrays can't be indexed

So, if you have multiple lines with multiple columns, you get a 2-D array,
as expected.
if you have a single line of data with multiple columns, you get a 1-D
array.
If you have a single column with many lines, you also get a 1-D array (which
is probably expected, I guess).
If you have a single column with a single line, you get a scalar (actually,
a 0-D array).

Is this a bug or a feature?  I can see the advantages of having loadtxt()
returning the lowest # of dimensions that can hold the given data, but it
leaves the code vulnerable to certain edge cases.  Maybe there is a
different way I should be doing this, but I feel that this behavior at the
very least should be included in the loadtxt documentation.

Ben Root
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20100624/2265f2d3/attachment.html 


More information about the NumPy-Discussion mailing list