Reading records from file and sorting

George Sakkis george.sakkis at gmail.com
Wed Nov 1 09:18:40 CST 2006


Francesc Altet wrote:

> El dt 31 de 10 del 2006 a les 23:38 +0000, en/na George Sakkis va
> escriure:
> > Is there a more elegant and/or faster way to read some records from a
> > file and then sort them by different fields ? What I have now is too
> > specific and error-prone in general:
> >
> > import numpy as N
> > records = N.fromfile(a_file, dtype=N.dtype('i2,i4'))
> > records_by_f0 = records.take(records.getfield('i2').argsort())
> > records_by_f1 = records.take(records.getfield('i4',2).argsort())
> >
> > If there's a better way, I'd like to see it; bonus points for in-place
> > sorting.
>
> Why this is too specific or error-prone?

Because it
1. repeats the field types
2. requires adding up the length of all previous fields as offset.

If you're not convinced yet, try writing this in less than 3 seconds
;-):
records = N.fromfile(a_file, dtype=N.dtype('i2,i4,f4,S5,B,Q'))
records_by_f5 = ??

> I think your solution is quite good.If what you want is a more compact way to
> write the above, you can
> try with:
>
> In [56]:records=numpy.array([(1,1),(0,2)], dtype="i2,i4")
> In [57]:records[records['f0'].argsort()]
> Out[57]:
> array([(0, 2), (1, 1)],
>       dtype=[('f0', '<i2'), ('f1', '<i4')])
> In [58]:records[records['f1'].argsort()]
> Out[58]:
> array([(1, 1), (0, 2)],
>       dtype=[('f0', '<i2'), ('f1', '<i4')])

Ah, much better; I didn't know you can index a normal array (not
recarray) by label. Now, if there's a way to do the sorting in place
(records.sort('f1') doesn't work unfortunately), that would be perfect.

Thanks,
George


-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642




More information about the Numpy-discussion mailing list