[Numpy-discussion] fromfile() for reading text (one more time!)

Pierre GM pgmdevlist@gmail....
Tue Jan 5 12:51:18 CST 2010


On Jan 5, 2010, at 12:32 PM, Christopher Barker wrote:
> josef.pktd@gmail.com wrote:
>> On Mon, Jan 4, 2010 at 10:39 PM,  <alan@ajackson.org> wrote:
>>> I rather like the R command(s) for reading text files
> 
>> Aren't the newly improved
>> 
>> numpy.genfromtxt()
> 
> ...
> 
>> and friends indented to handle all this
> 
> Yes, they are, and they are great, but not really all that fast. If 
> you've got big complicated tables of data to read, then genfromtxt is 
> the way to go -- it's a great tool. However, for the simple stuff, it's 
> not really optimized.

genfromtxt is nothing but loadtxt overloaded to deal with undefined dtype and missing entries. It's doomed to be slower, and it shouldn't be used if you know your data is well-defined and well-behaved. Stick to loadtxt

> I also find I have to read a lot of text files 
> that aren't tables of data, but rather an odd mix of stuff, but still a 
> lot of reading lots of numbers from a file.

Well, everything depends on what kind of stuff you have in your mix, I guess...

> so fromfile() is 3.5 times as fast as loadtxt and 4.5 times as fast as 
> genfromtxt. That does make a difference for me -- the user waiting 4 
> seconds, rather than one second to load a file matters.

Rmmbr that fromfile is C when loadtxt and genfromtxt are Python...

> I suppose another option might be to see if I can optimize the inner 
> scanning function of genfromtxt with Cython or C, but I'm not sure 
> that's possible, as it's really very flexible, and re-writing all of 
> that without Python would be really painful!


Well, there's room for some optimization for particular cases (dtype!=None), but the generic case will be tricky...




More information about the NumPy-Discussion mailing list