array from list of lists

Tim Hochberg tim.hochberg at ieee.org
Mon Nov 13 09:03:33 CST 2006


Francesc Altet wrote:
> El dl 13 de 11 del 2006 a les 02:07 -0500, en/na Erin Sheldon va
> escriure:
>   
>> On 11/13/06, Charles R Harris <charlesr.harris at gmail.com> wrote:
>>     
>>> On 11/12/06, Erin Sheldon <erin.sheldon at gmail.com> wrote:
>>>       
>>>> Hi all -
>>>>
>>>> Thanks to everyone for the suggestions.
>>>> I think map(tuple, list) is probably the most compact,
>>>> but the list comprehension also works well.
>>>>
>>>> Because map() is proably going to disappear someday, I'll
>>>> stick with the list comprehension.
>>>>   array( [tuple(row) for row in result], dtype=dtype)
>>>>
>>>> That said, is there some compelling reason that the array
>>>> function doesn't support this operation?
>>>>         
>>> My understanding is that the array needs to be allocated up front. Since the
>>> list comprehension is iterative it is impossible to know how big the result
>>> is going to be.
>>>       
>> Isn't it the same with a list of tuples?  But you can send that directly to the
>> array constructor.  I don't see the fundamental difference, except that the
>> code might be simpler to write.
>>     
>
> I think that the correct explanation is that Travis has chosen a tuple
> as the way to refer to a inhomogeneous list of values (a record) and a
> list as the way to refer to homogenous list of values. 
Just for the record, this is the officially blessed usage of tuple and 
lists for all of Python (by Guido himself). On the other hand, it's 
honored more in the breach than in reality. Since other factors, such as 
mutability/immutability or the mistaken belief that using tuples 
everywhere will make code noticeably faster or more memory frugal or 
something.

> I'm not
> completely sure why he did this, but I guess the reason was to be able
> to distinguish the records in scenarios where nested records do appear.
>   
I suspect that this could be made a little more forgiving, without 
loosing rigor. As long as none of the fields are objects of course in 
which case nearly all bets are off. Then again, the rule that tuple 
designate records is a lot simpler than something like tuples designate 
records, but you can use lists too, unless of course you have an object 
field in your array, in which case you really need to use tuples, except 
sometimes lists will work anyway, depending on where the object is 
fields is. So, maybe it's best just to keep it strict.

> In any case, you can also use rec.fromrecords for build recarrays from
> lists of lists. This breaks the aforementioned rule, but Travis allowed
> this because rec.* had to mimic numarray behaviour as much as possible.
> Here is an example of use:
> [SNIP]
>   

Just for completeness, I benchmarked the fromiter and map(tuple, 
results)  solutions as well. Map is fastest, followed by fromiter, list 
comprehension and then fromrecords. The differences are pretty minor 
however, so I'd stick with whatever seems clearest.

-tim


print Timer("numpy.rec.fromrecords(results, dtype=mydescriptor)",
"""import numpy; results = [['M',64.0,75.0]]*100000; mydescriptor = 
{'names': ('gender','age','weight'), 
'formats':('S1','f4','f4')}""").repeat(3,10)

print Timer("numpy.array([tuple(row) for row in results], 
dtype=mydescriptor)",
"""import numpy; results = [['M',64.0,75.0]]*100000; mydescriptor = 
{'names': ('gender','age','weight'),'formats':('S1','f4', 
'f4')}""").repeat(3,10)

print Timer("numpy.fromiter((tuple(x) for x in results), 
dtype=mydescriptor, count=len(results))",
"""import numpy; results = [['M',64.0,75.0]]*100000; mydescriptor = 
{'names': ('gender','age','weight'),'formats':('S1','f4', 
'f4')}""").repeat(3,10)

print Timer("numpy.array(map(tuple, results), dtype=mydescriptor)",
"""import numpy; results = [['M',64.0,75.0]]*100000; mydescriptor = 
{'names': ('gender','age','weight'),'formats':('S1','f4', 
'f4')}""").repeat(3,10)

===>

[1.3928521641717035, 1.3892659541925021, 1.3949996438094785]
[1.344854164425926, 1.3157404083479882, 1.3207066819944986]
[1.2768430065832401, 1.2742884919731416, 1.2736657871321633]
[1.2081393026208644, 1.2025276955590734, 1.205871416618594]


-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642




More information about the Numpy-discussion mailing list