[Numpy-discussion] np.loadtxt : yet a new implementation...

Manuel Metz mmetz@astro.uni-bonn...
Wed Dec 3 13:12:16 CST 2008


Manuel Metz wrote:
> Alan G Isaac wrote:
>> If I know my data is already clean
>> and is handled nicely by the
>> old loadtxt, will I be able to turn
>> off and the special handling in
>> order to retain the old load speed?
>>
>> Alan Isaac
>>
> 
> Hi all,
>   that's going in the same direction I was thinking about.
> When I thought about an improved version of loadtxt, I wished it was
> fault tolerant without loosing too much performance.
>   So my solution was much simpler than the very nice genloadtxt function
> -- and it works for me.
> 
> My ansatz is to leave the existing loadtxt function unchanged. I only
> replaced the default converter calls by a fault tolerant converter
> class. I attached a patch against io.py in numpy 1.2.1
> 
> The nice thing is that it not only handles missing values, but for
> example also columns/fields with non-number characters. It just returns
> nan in these cases. This is of practical importance for many datafiles
> of astronomical catalogues, for example the Hipparcos catalogue data.
> 
> Regarding the performance, it is a little bit slower than the original
> loadtxt, but not much: on my machine, 10x reading in a clean testfile
> with 3 columns and 20000 rows I get the following results:
> 
> original loadtxt:  ~1.3s
> modified loadtxt:  ~1.7s
> new genloadtxt  :  ~2.7s
> 
> So you see, there is some loss of performance, but not as much as with
> the new converter class.
> 
> I hope this solution is of interest ...
> 
> Manuel
>

Oops, wrong version of the diff file. Wanted to name the class
"_faulttolerantconv" ...



-------------- next part --------------
A non-text attachment was scrubbed...
Name: io.diff
Type: text/x-patch
Size: 628 bytes
Desc: not available
Url : http://projects.scipy.org/pipermail/numpy-discussion/attachments/20081203/2172b409/attachment.bin 


More information about the Numpy-discussion mailing list