[Numpy-discussion] loadtxt() behavior on single-line files
Benjamin Root
ben.root@ou....
Tue Jul 27 11:58:56 CDT 2010
On Thu, Jun 24, 2010 at 1:53 PM, Benjamin Root <ben.root@ou.edu> wrote:
> On Thu, Jun 24, 2010 at 1:00 PM, Warren Weckesser <
> warren.weckesser@enthought.com> wrote:
>
>> Benjamin Root wrote:
>> > Hi,
>> >
>> > I was having the hardest time trying to figure out an intermittent bug
>> > in one of my programs. Essentially, in some situations, it was
>> > throwing an error saying that the array object was not an array. It
>> > took me a while, but then I figured out that my program was assuming
>> > that the object returned from a loadtxt() call was always a structured
>> > array (I was using dtypes). However, if the data file being loaded
>> > only had one data record, then all you get back is a structured record.
>> >
>> > import numpy as np
>> > from StringIO import StringIO
>> >
>> > strData = StringIO("89.23 47.2\n13.2 42.2")
>> > a = np.loadtxt(strData, dtype=[('x', float), ('y', float)])
>> > print "Length Two"
>> > print a
>> > print a.shape
>> > print len(a)
>> >
>> > strData = StringIO("53.2 49.2")
>> > a = np.loadtxt(strData, dtype=[('x', float), ('y', float)])
>> > print "\n\nLength One"
>> > print a
>> > print a.shape
>> > try :
>> > print len(a)
>> > except TypeError as err
>> > print "ERROR:", err
>> >
>> > Which gets me this output:
>> >
>> > Length Two
>> > [(89.230000000000004, 47.200000000000003)
>> > (13.199999999999999, 42.200000000000003)]
>> > (2,)
>> > 2
>> >
>> >
>> > Length One
>> > (53.200000000000003, 49.200000000000003)
>> > ()
>> > ERROR: len() of unsized object
>> >
>> >
>> > Note that this isn't restricted to structured arrays. For regular
>> > ndarrays, loadtxt() appears to mimic the behavior of np.squeeze():
>>
>> Exactly. The last four lines of the function are:
>>
>> X = np.squeeze(X)
>> if unpack:
>> return X.T
>> else:
>> return X
>>
>> >
>> > >>> a = np.ones((1, 1, 1))
>> > >>> np.squeeze(a)[0]
>> > IndexError: 0-d arrays can't be indexed
>> >
>> > >>> strData = StringIO("53.2")
>> > >>> a = np.loadtxt(strData)
>> > >>> a[0]
>> > IndexError: 0-d arrays can't be indexed
>> >
>> > So, if you have multiple lines with multiple columns, you get a 2-D
>> > array, as expected.
>> > if you have a single line of data with multiple columns, you get a 1-D
>> > array.
>> > If you have a single column with many lines, you also get a 1-D array
>> > (which is probably expected, I guess).
>> > If you have a single column with a single line, you get a scalar
>> > (actually, a 0-D array).
>> >
>> > Is this a bug or a feature? I can see the advantages of having
>> > loadtxt() returning the lowest # of dimensions that can hold the given
>> > data, but it leaves the code vulnerable to certain edge cases. Maybe
>> > there is a different way I should be doing this, but I feel that this
>> > behavior at the very least should be included in the loadtxt
>> > documentation.
>> >
>>
>> It would be useful to be able to tell loadtxt to not call squeeze, so a
>> program that reads column-formatted data doesn't have to treat the case
>> of a single line specially.
>>
>> Warren
>>
>
> I don't know if that is the best way to solve the problem. In that case,
> you would always get a 2-D array, right? Is that useful for those who have
> text data as a single column? Maybe a mindim keyword (with None as default)
> and apply an appropriate "atleast_Nd()" call (or maybe have available an
> .atleast_nd() function?). But, then what would this mean for structured
> arrays? One might think that they want at least 2-D, but they really want
> at least 1-D.
>
> Ben Root
>
> P.S. - Taking this a step further, the functions completely fail in dealing
> with empty files... In MATLAB, it returns an empty array (matrix?).
>
I am reviving this "dead" thread to note that I have filed ticket #1562 on
the numpy Trac about this issue: http://projects.scipy.org/numpy/ticket/1562
Ben Root
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20100727/55ae22cd/attachment.html
More information about the NumPy-Discussion
mailing list