[Numpy-discussion] Convert recarray to list (is this a bug?)

Travis Oliphant travis@continuum...
Tue Jul 10 02:02:56 CDT 2012


On Jul 9, 2012, at 9:24 PM, Yan Tang wrote:

> Hi,
> 
> I noticed there is an odd issue when I am trying to convert a recarray to list.  See below for the example/test case.
> 
> $ cat a.csv
> date,count
> 2011-07-25,91
> 2011-07-26,118
> $ cat b.csv
> name,count
> foo,1233
> bar,100
> 
> $ python
> 
> >>> from matplotlib import mlab
> >>> import numpy as np
> 
> >>> a = mlab.csv2rec('a.csv')
> >>> b = mlab.csv2rec('b.csv')
> >>> a
> rec.array([(datetime.date(2011, 7, 25), 91), (datetime.date(2011, 7, 26), 118)], 
>       dtype=[('date', '|O8'), ('count', '<i8')])
> >>> b
> rec.array([('foo', 1233), ('bar', 100)], 
>       dtype=[('name', '|S3'), ('count', '<i8')])
> 
> 
> >>> np.array(a.tolist()).tolist()
> [[datetime.date(2011, 7, 25), 91], [datetime.date(2011, 7, 26), 118]]
> >>> np.array(b.tolist()).tolist()
> [['foo', '1233'], ['bar', '100']]
> 
> 
> The odd case is, 1233 becomes a string '1233' in the second command.  But 91 is still a number 91.
> 
> Why would this happen?  What's the correct way to do this conversion?

You are trying to convert the record array into a list of lists, I presume?   The tolist() method on the rec.array produces a list of tuples.   Be sure that a list of tuples does not actually satisfy your requirements --- it might.    

Passing this back to np.array is going to try to come up with a data-type that satisfies all the elements in the list of tuples.  You are relying here on np.array's "intelligence" for trying to figure out what kind of array you have.   It tries to do it's best, but it is limited to determining a "primitive" data-type (float, int, string, object).   It can't always predict what you expect --- especially when the original data source was a record like this.    In the first case, because of the date-time object, it decides the data is an "object" array which works.  In the second it decides that the data can all be represented as a "string" and so choose that.   The second .tolist() just produces a list out of the 2-d array. 

Likely what you want to do is just create a list of lists from the original output of .tolist.   Like this: 

[list(x) for x in a.tolist()]
[list(x) for x in b.tolist()]

This wil be faster as well...

Best, 

-Travis








> 
> Thanks.
> 
> -uris-
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20120710/d7d67258/attachment.html 


More information about the NumPy-Discussion mailing list