array from list of lists

Francesc Altet faltet at carabos.com
Mon Nov 13 02:19:17 CST 2006


El dl 13 de 11 del 2006 a les 02:07 -0500, en/na Erin Sheldon va
escriure:
> On 11/13/06, Charles R Harris <charlesr.harris at gmail.com> wrote:
> >
> >
> > On 11/12/06, Erin Sheldon <erin.sheldon at gmail.com> wrote:
> > > Hi all -
> > >
> > > Thanks to everyone for the suggestions.
> > > I think map(tuple, list) is probably the most compact,
> > > but the list comprehension also works well.
> > >
> > > Because map() is proably going to disappear someday, I'll
> > > stick with the list comprehension.
> > >   array( [tuple(row) for row in result], dtype=dtype)
> > >
> > > That said, is there some compelling reason that the array
> > > function doesn't support this operation?
> >
> > My understanding is that the array needs to be allocated up front. Since the
> > list comprehension is iterative it is impossible to know how big the result
> > is going to be.
> 
> Isn't it the same with a list of tuples?  But you can send that directly to the
> array constructor.  I don't see the fundamental difference, except that the
> code might be simpler to write.

I think that the correct explanation is that Travis has chosen a tuple
as the way to refer to a inhomogeneous list of values (a record) and a
list as the way to refer to homogenous list of values. I'm not
completely sure why he did this, but I guess the reason was to be able
to distinguish the records in scenarios where nested records do appear.

In any case, you can also use rec.fromrecords for build recarrays from
lists of lists. This breaks the aforementioned rule, but Travis allowed
this because rec.* had to mimic numarray behaviour as much as possible.
Here is an example of use:

In [46]:mydescriptor = {'names': ('gender','age','weight'),
'formats':('S1','f4', 'f4')}
In [47]:results=[['M',64.0,75.0],['F',25.0,60.0]]
In [48]:a = numpy.rec.fromrecords(results, dtype=mydescriptor)
In [49]:b = numpy.array([tuple(row) for row in results],
dtype=mydescriptor)
In [50]:a==b
Out[50]:recarray([True, True], dtype=bool)

OTOH, it is said in the docs that fromrecords is discouraged because it
is somewhat slow, but apparently it has similar performance than using
comprehensions lists:

In [51]:Timer("numpy.rec.fromrecords(results, dtype=mydescriptor)",
"import numpy; results = [['M',64.0,75.0]]*10000; mydescriptor =
{'names': ('gender','age','weight'), 'formats':('S1','f4',
'f4')}").repeat(3,10)
Out[51]:[0.44204592704772949, 0.43584394454956055, 0.50145101547241211]

In [52]:Timer("numpy.array([tuple(row) for row in results],
dtype=mydescriptor)", "import numpy; results = [['M',64.0,75.0]]*10000;
mydescriptor = {'names': ('gender','age','weight'),
'formats':('S1','f4', 'f4')}").repeat(3,10)
Out[52]:[0.49885106086730957, 0.4325258731842041, 0.43297886848449707]


HTH,

-- 
Francesc Altet    |  Be careful about using the following code --
Carabos Coop. V.  |  I've only proven that it works, 
www.carabos.com   |  I haven't tested it. -- Donald Knuth


-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642




More information about the Numpy-discussion mailing list