[SciPy-Dev] np.savetxt: apply patch in enhancement ticket 1079 to add headers?

Stefan stefan.czesla@hs.uni-hamburg...
Wed Jun 2 12:14:04 CDT 2010


> Not that I am complaining rather trying to understand what is expected
> to happen. 
> Under the patch, it is very much user beware.  The header argument can
> be anything or nothing. There is no check for the contents or if the
> delimiter used is the same as the rest of the output. Further with the
> newline option there is no guarantee that the lines in the header will
> have the same line endings throughout the file.
> So what should a user be allowed to use as a header? 
> You could write a whole program there or an explanation of the
> following output - which is very appealing. You could force a list of
> strings so that you print out newline.join(header) - okay not quite
> because it should include the comment argument.
> Should savetxt be restricted to something that loadtxt can read? 
> This is potentially problematic if you want a header line. Although it
> could return the number of header lines.
> [savetxt should also be updated to allow bz2 as loadtxt handles those
> now - not that I have used it]
>   
>   
>   
>     
> Also note that since that patch was written, savetxt takes a user
> supplied newline keyword, so you can just append that to the header
> string.
> 
>     
>   
>   True, we were not aware of this, but this does not help much for the
> comment/header. 
>   
> 
> 
> Entered as ~3 months ago:http://projects.scipy.org/numpy/changeset/8180
> Should this be forced to check for valid options for new lines?
> Otherwise you from this  'np.savetxt('junk.text', [1,2,3,4,5],
> newline='what')' you get:
>
1.000000000000000000e+00what2.000000000000000000e+00what
3.000000000000000000e+00what4.000000000000000000e+00
what5.000000000000000000e+00what
> Which is not going to be read back by loadtxt.
>   
>     
>       
> As numpy.loadtxt has a default comment character ('#'), the same may be
> implemented for numpy.savetxt. In this case, numpy.savetxt would get two
> additional keywords (e.g. header, comment(character)), which bloats the
> interface, but potentially provides more safety.
> 
>       
>     
>     
> FWIW, I ended up rolling my own using the most recent pre-Python 3
> changes for savetxt that accepts a list of names instead of one string
> or if the provided array has the attribute dtype.names (non-nested rec
> or structured arrays) it uses those.  Whatever is done I think the
> support for structured arrays is nice, and I think having this
> functionality is a no-brainer.  I need it quite often.
> 
>     
>   
>   Although, we have not been using record arrays too often, we see their
> advantages and agree that it should be possible to use them as you described
> it.
> We also thought about a solution, using the __str__ method for the 'header
> object'. In this vain, an arbitrary header class (including a plane string)
> providing an __str__ member may be handed to numpy.savetxt,
> which can use it to write the header. 
> 


So let us briefly summarize whats on the table. It appears to us that
there are basically three open issues:
(1) a csv like header for savetxt written files (first line contains column
    names)
(2) comments (introduced by comment character e.g. '#') at the beginning
    of the file (preceding the data)
(3) the role of the 'newline' option

As was noted, the patch (ticket 1079) enables both to write a csv like
header (1) and comment line(s) introduced by a comment character (e.g. '#').
Nonetheless, this solution is quite unsatisfactory
in our opinion, because it may be error prone,
as the user is in charge of the entire formatting. Despite this, we think
that it should be up to the user what amount of information is to be put
at the top of the file, but the format should be checked as far as possible.

Using either a string or a list/tuple of strings, as proposed by Bruce,
seems to be a reasonable possibility to implement the desired functionality.
Maybe two individual keywords ('header' and 'comment') should exist to
distinguish whether the the user requests case (1) or (2). As for loadtxt
the default comment character should be '#', but it may be changed by the
user.

We think that savetxt should not be restricted to output, which can be read
by loadtxt. Although it should be possible to add commments to the output
file, so that it remains readable by loadtxt (without tweaking it
e.g. with the skiprows keyword). 

We agree that the newline keyword may cause inconsistencies in the file
(if ticket 1079 were applied),
and possibly strange behavior such as when newline='what' is specified.
Yet, this question does not only concern the header/comments.

Stefan & Christian






More information about the SciPy-Dev mailing list