[Numpy-discussion] More loadtxt() changes

Pierre GM pgmdevlist@gmail....
Tue Nov 25 13:25:17 CST 2008

On Nov 25, 2008, at 2:06 PM, Ryan May wrote:
> 1) It looks like the function returns a structured array rather than a
> rec array, so that fields are obtained by doing a dictionary access.
> Since it's a dictionary access, is there any reason that the header
> needs to be munged to replace characters and reserved names?  IIUC,
> csv2rec changes names b/c it returns a rec array, which uses attribute
> lookup and hence all names need to be valid python identifiers.   
> This is
> not the case for a structured array.

Personally, I prefer flexible ndarrays to recarrays, hence the output.  
However, I still think that names should be as clean as possible to  
avoid bad surprises down the road.

> 2) Can we avoid the use of seek() in here?  I just posted a patch to
> change the check to readline, which was the only file function used
> previously.  This allowed the direct use of a file-like object  
> returned
> by urllib2.urlopen().

I coded that a couple of weeks ago, before you posted your patch and I  
didn't have tme to check it. Yes, we could try getting rid of seek.  
However, we need to find a way to rewind to the beginning of the file  
if the dtypes are not given in input (as we parsed the whole file to  
find the best converter in that case).

> 3) In order to avoid breaking backwards compatibility, can we change  
> to
> default for dtype to be float32, and instead use some kind of special
> value ('auto' ?) to use the automatic dtype determination?

I'm not especially concerned w/ backwards compatibility, because we're  
supporting masked values (something that np.loadtxt shouldn't have to  
worry about). Initially, I needed a replacement to the fromfile  
function in the scikits.timeseries.trecords package. I figured it'd be  
easier and more portable to get a function for generic masked arrays,  
that could be adapted afterwards to timeseries. In any case, I was  
more considering the functions I send you to be part of some  
numpy.ma.io module than a replacement to np.loadtxt. I tried to get  
the syntax as close as possible to np.loadtxt and mlab.csv2rec, but  
there'll always be some differences.

So, yes, we could try to use a default dtype=float and yes, we could  
have an extra parameter 'auto'. But is it really that useful ? I'm not  
sure (well, no, I'm sure it's not...)

> I'm currently cooking up some of these changes myself, but thought I
> would see what you thought first.

More information about the Numpy-discussion mailing list