[SciPy-Dev] np.savetxt: apply patch in enhancement ticket 1079 to add headers?

Bruce Southey bsouthey@gmail....
Wed Jun 2 09:41:39 CDT 2010


On 06/02/2010 06:21 AM, Stefan wrote:
>    
>>> If the header is given as a plane string
>>> (such as envisaged in ticket 1079), the
>>> user has to care for the correct formatting, in particular,
>>> the user has to
>>> supply the comment character(s) and the new line formatting.
>>> This might be
>>> against intuition, because many users will at first try to supply their
>>> header(s) without specifying those formatting characters.
>>> The result will be a
>>> file not readable with numpy.loadtxt, and the error might
>>> not be detected right
>>> away.
>>>        
>> I'm not sure I understand why I would want to specify a comment
>> character for writing a csv file (unless of course I had some comments
>> to add).
>>      
> We are possibly talking about different things. In our approach of using
> numpy.savetxt comments (preceeding the actual data) and a header
> are essentially the same, such as in the following example.
> Basically, we want to add some lines
> of additional information at the top of the file written with
> numpy.savetxt, and be able to recover the data with numpy.loadtxt
> (for which the 'header' would
> then be irrelevant, what may not be your intention, or is it?).
>
> #Now comes the data
> #column1 [kg] column2 [apple]
> 1  2
> 3  5
>
>    
Not that I am complaining rather trying to understand what is expected 
to happen.

Under the patch, it is very much user beware.  The header argument can 
be anything or nothing. There is no check for the contents or if the 
delimiter used is the same as the rest of the output. Further with the 
newline option there is no guarantee that the lines in the header will 
have the same line endings throughout the file.

So what should a user be allowed to use as a header?
You could write a whole program there or an explanation of the following 
output - which is very appealing. You could force a list of strings so 
that you print out newline.join(header) - okay not quite because it 
should include the comment argument.

Should savetxt be restricted to something that loadtxt can read?
This is potentially problematic if you want a header line. Although it 
could return the number of header lines.

[savetxt should also be updated to allow bz2 as loadtxt handles those 
now - not that I have used it]

>    
>> Also note that since that patch was written, savetxt takes a user
>> supplied newline keyword, so you can just append that to the header
>> string.
>>
>>      
> True, we were not aware of this, but this does not help much for the
> comment/header.
>    
Entered as ~3 months ago:
http://projects.scipy.org/numpy/changeset/8180

Should this be forced to check for valid options for new lines?
Otherwise you from this  'np.savetxt('junk.text', [1,2,3,4,5], 
newline='what')' you get:
1.000000000000000000e+00what2.000000000000000000e+00what3.000000000000000000e+00what4.000000000000000000e+00what5.000000000000000000e+00what
Which is not going to be read back by loadtxt.

>>> As numpy.loadtxt has a default comment character ('#'), the same may be
>>> implemented for numpy.savetxt. In this case, numpy.savetxt would get two
>>> additional keywords (e.g. header, comment(character)), which bloats the
>>> interface, but potentially provides more safety.
>>>
>>>        
>> FWIW, I ended up rolling my own using the most recent pre-Python 3
>> changes for savetxt that accepts a list of names instead of one string
>> or if the provided array has the attribute dtype.names (non-nested rec
>> or structured arrays) it uses those.  Whatever is done I think the
>> support for structured arrays is nice, and I think having this
>> functionality is a no-brainer.  I need it quite often.
>>
>>      
> Although, we have not been using record arrays too often, we see their
> advantages and agree that it should be possible to use them as you described
> it.
> We also thought about a solution, using the __str__ method for the 'header
> object'. In this vain, an arbitrary header class (including a plane string)
> providing an __str__ member may be handed to numpy.savetxt,
> which can use it to write the header.
>
>    
>> Skipper
>>
>>      
>
It would nice if  savetxt used the dtype of the input to get a header 
and format by default unless overwritten by the user.

Bruce
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/scipy-dev/attachments/20100602/1db6e065/attachment-0001.html 


More information about the SciPy-Dev mailing list