[Numpy-discussion] loadtxt/savetxt tickets

Charles R Harris charlesr.harris@gmail....
Wed Mar 30 21:53:14 CDT 2011


On Sun, Mar 27, 2011 at 4:09 AM, Paul Anton Letnes <
paul.anton.letnes@gmail.com> wrote:

>
> On 26. mars 2011, at 21.44, Derek Homeier wrote:
>
> > Hi Paul,
> >
> > having had a look at the other tickets you dug up,
> >
> >> My opinions are my own, and in detail, they are:
> >> 1752:
> >>   I attach a possible patch. FWIW, I agree with the request. The
> >> patch is written to be compatible with the fix in ticket #1562, but
> >> I did not test that yet.
> >
> > Tested, see also my comments on Trac.
>
> Great!
>
> >> 1731:
> >>   This seems like a rather trivial feature enhancement. I attach a
> >> possible patch.
> >
> > Agreed. Haven't tested it though.
>
> Great!
>
> >> 1616:
> >>   The suggested patch seems reasonable to me, but I do not have a
> >> full list of what objects loadtxt supports today as opposed to what
> >> this patch will support.
>
> Looks like you got this one. Just remember to make it compatible with
> #1752. Should be easy.
>
> >> 1562:
> >>   I attach a possible patch. This could also be the default
> >> behavior to my mind, since the function caller can simply call
> >> numpy.squeeze if needed. Changing default behavior would probably
> >> break old code, however.
> >
> > See comments on Trac as well.
>
> Your patch is better, but there is one thing I disagree with.
> 808    if X.ndim < ndmin:
> 809        if ndmin == 1:
> 810            X.shape = (X.size, )
> 811        elif ndmin == 2:
> 812            X.shape = (X.size, 1)
> The last line should be:
> 812            X.shape = (1, X.size)
> If someone wants a 2D array out, they would most likely expect a one-row
> file to come out as a one-row array, not the other way around. IMHO.
>
> >> 1458:
> >>   The fix suggested in the ticket seems reasonable, but I have
> >> never used record arrays, so I am not sure  of this.
> >
> > There were some issues with Python3, and I also had some general
> > reservations
> > as noted on Trac - basically, it makes 'unpack' equivalent to
> > transposing for 2D-arrays,
> > but to splitting into fields for 1D-recarrays. My question was, what's
> > going to happen
> > when you get to 2D-recarrays? Currently this is not an issue since
> > loadtxt can only
> > read 2D regular or 1D structured arrays. But this might change if the
> > data block
> > functionality (see below) were to be implemented - data could then be
> > returned as
> > 3D arrays or 2D structured arrays... Still, it would probably make
> > most sense (or at
> > least give the widest functionality) to have 'unpack=True' always
> > return a list or iterator
> > over columns.
>
> OK, I don't know recarrays, as I said.
>
> >> 1445:
> >>   Adding this functionality could break old code, as some old
> >> datafiles may have empty lines which are now simply ignored. I do
> >> not think the feature is a good idea. It could rather be implemented
> >> as a separate function.
> >> 1107:
> >>   I do not see the need for this enhancement. In my eyes, the
> >> usecols kwarg does this and more. Perhaps I am misunderstanding
> >> something here.
> >
> > Agree about #1445, and the bit about 'usecols' - 'numcols' would just
> > provide a
> > shorter call to e.g. read the first 20 columns of a file (well, not
> > even that much
> > over 'usecols=range(20)'...), don't think that justifies an extra
> > argument.
> > But the 'datablocks' provides something new, that a number of people
> > seem
> > to miss from e.g. gnuplot (including me, actually ;-). And it would
> > also satisfy the
> > request from #1445 without breaking backwards compatibility.
> > I've been wondering if could instead specify the separator lines
> > through the
> > parameter, e.g. "blocksep=['None', 'blank','invalid']", not sure if
> > that would make
> > it more useful...
>
> What about writing a separate function, e.g. loadblocktxt, and have it
> separate the chunks and call loadtxt for each chunk? Just a thought. Another
> possibility would be to write a function that would let you load a set of
> text files in a directory, and return a dict of datasets, one per file. One
> could write a similar save-function, too. They would just need to call
> loadtxt/savetxt on a per-file basis.
>
> >> 1071:
> >>      It is not clear to me whether loadtxt is supposed to support
> >> missing values in the fashion indicated in the ticket.
> >
> > In principle it should at least allow you to, by the use of converters
> > as described there.
> > The problem is, the default delimiter is described as 'any
> > whitespace', which in the
> > present implementation obviously includes any number of blanks or
> > tabs. These
> > are therefore treated differently from delimiters like ',' or '&'. I'd
> > reckon there are
> > too many people actually relying on this behaviour to silently change it
> > (e.g. I know plenty of tables with columns separated by either one or
> > several
> > tabs depending on the length of the previous entry). But the tab is
> > apparently also
> > treated differently if explicitly specified with "delimiter='\t'" -
> > and in that case using
> > a converter à la {2: lambda s: float(s or 'Nan')} is working for
> > fields in the middle of
> > the line, but not at the end - clearly warrants improvement. I've
> > prepared a patch
> > working for Python3 as well.
>
> Great!
>
> >> 1163:
> >> 1565:
> >>   These tickets seem to have the same origin of the problem. I
> >> attach one possible patch. The previously suggested patches that
> >> I've seen will not correctly convert floats to ints, which I believe
> >> my patch will.
> >
> > +1, though I am a bit concerned that prompting to raise a ValueError
> > for every
> > element could impede performance. I'd probably still enclose it into an
> > if issubclass(typ, np.uint64) or issubclass(typ, np.int64):
> > just like in npio.patch. I also thought one might switch to
> > int(float128(x)) in that
> > case, but at least for the given examples float128 cannot convert with
> > more
> > accuracy than float64 (even on PowerPC ;-).
> > There were some dissenting opinions that trying to read a float into
> > an int should
> > generally throw an exception though...
> >
> > And Chuck just beat me...
>
> I am sure someone has been using this functionality to convert floats to
> ints. Changing will break their code. I am not sure how big a deal that
> would be. Also, I am of the opinion that one should _first_ write a program
> that works _correctly_, and only afterwards worry about performance. Even
> so, using numpy.int64 would be better, because it would give a sensible
> error message.
>
> > On 26 Mar 2011, at 21:25, Charles R Harris wrote:
> >
> >> I put all these patches together at
> https://github.com/charris/numpy/tree/loadtxt-savetxt
> >> . Please pull from there to continue work on loadtxt/savetxt so as
> >> to avoid conflicts in the patches. One of the numpy tests is
> >> failing, I assume from patch conflicts, and more tests for the
> >> tickets are needed in any case. Also, new keywords should be added
> >> to the end, not put in the middle of existing keywords.
> >>
> >> I haven't reviewed the patches, just tried to get them organized.
> >> Also, I have Derek as the author on all of them, that can be changed
> >> if it is decided the credit should go elsewhere ;) Thanks for the
> >> work you all have been doing on these tickets.
> >
> > Thanks, I'll have a look at the new ticket and try to get that
> > organized!
>
> With some luck, all the loadtxt tickets should be closed in short time :-)
>
>
Can some one bring me up to date on which tickets have been dealt with?
Also, how do they divide into bug fixes and enhancements?

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20110330/2f9b82ea/attachment-0001.html 


More information about the NumPy-Discussion mailing list