[Numpy-discussion] formatting issues, locale and co

Charles R Harris charlesr.harris@gmail....
Sun Dec 28 01:31:22 CST 2008


On Sat, Dec 27, 2008 at 11:55 PM, David Cournapeau <
david@ar.media.kyoto-u.ac.jp> wrote:

> Charles R Harris wrote:
> >
> >
> > On Sat, Dec 27, 2008 at 11:46 PM, Robert Kern <robert.kern@gmail.com
> > <mailto:robert.kern@gmail.com>> wrote:
> >
> >     On Sun, Dec 28, 2008 at 01:38, Charles R Harris
> >     <charlesr.harris@gmail.com <mailto:charlesr.harris@gmail.com>>
> wrote:
> >     >
> >     > On Sat, Dec 27, 2008 at 10:27 PM, David Cournapeau
> >     > <david@ar.media.kyoto-u.ac.jp
> >     <mailto:david@ar.media.kyoto-u.ac.jp>> wrote:
> >     >>
> >     >> Hi,
> >     >>
> >     >>    While looking at the last failures of numpy trunk on windows
> for
> >     >> python 2.5 and 2.6, I got into floating point number formatting
> >     issues;
> >     >> I got deeper and deeper, and now I am lost. We have several
> >     problems:
> >     >>    - we are not consistent between platforms, nor are we
> consistent
> >     >> with python
> >     >>    - str(np.float32(a)) is locale dependent, but python str
> >     method is
> >     >> not (locale.str is)
> >     >>    - formatting of long double does not work on windows because
> >     of the
> >     >> broken long double support in mingw.
> >     >>
> >     >> 1 consistency problem:
> >     >> ----------------------
> >     >>
> >     >> python -c "a = 1e20; print a" -> 1e+020
> >     >> python26 -c "a = 1e20; print a" -> 1e+20
> >     >>
> >     >> In numpy, we use PyOS_snprintf for formatting, but python
> >     itself uses
> >     >> PyOS_ascii_formatd - which has different behavior on different
> >     versions
> >     >> of python. The above behavior can be simply reproduced in C:
> >     >>
> >     >> #include <Python.h>
> >     >>
> >     >> int main()
> >     >> {
> >     >>    double x = 1e20;
> >     >>    char c[200];
> >     >>
> >     >>    PyOS_ascii_format(c, sizeof(c), "%.12g", x);
> >     >>    printf("%s\n", c);
> >     >>    printf("%g\n", x);
> >     >>
> >     >>    return 0;
> >     >> }
> >     >>
> >     >> On 2.5, this will print:
> >     >>
> >     >> 1e+020
> >     >> 1e+020
> >     >>
> >     >> But on 2.6, this will print:
> >     >>
> >     >> 1e+20
> >     >> 1e+020
> >     >>
> >     >> 2 locale dependency:
> >     >> --------------------
> >     >>
> >     >> Another issue is that our own formatting is local dependent,
> >     whereas
> >     >> python isn't:
> >     >>
> >     >> import numpy as np
> >     >> import locale
> >     >> locale.setlocale(locale.LC_NUMERIC, 'fr_FR')
> >     >> a = 1.2
> >     >>
> >     >> print "str(a)", str(a)
> >     >> print "locale.str(a)", locale.str(a)
> >     >> print "str(np.float32(a))", str(np.float32(a))
> >     >> print "locale.str(np.float32(a))", locale.str(np.float32(a))
> >     >>
> >     >> Returns:
> >     >>
> >     >> str(a) 1.2
> >     >> locale.str(a) 1,2
> >     >> str(np.float32(a)) 1,2
> >     >> locale.str(np.float32(a)) 1,20000004768
> >     >>
> >     >> I thought about copying the way python does the formatting in
> >     the trunk
> >     >> (where discrepancies between platforms have been fixed), but
> >     this is not
> >     >> so easy, because it uses a lot of code from different places -
> >     and the
> >     >> code needs to be adapted to float and long double. The other
> >     solution
> >     >> would be to do our own formatting, but this does not sound easy:
> >     >> formatting in C is hard. I am not sure about what we should do, if
> >     >> anyone else has any idea ?
> >     >
> >     > I think the first thing to do is make a decision on locale. If
> >     we chose to
> >     > support locales I don't see much choice but to depend Python
> >     because it's
> >     > too much work otherwise, and work not directly related to Numpy
> >     at that. If
> >     > we decide not to support locales then we can do our own
> >     formatting if we
> >     > need to using a fixed choice of locale. There is a list of snprintf
> >     > implementations here. Trio looks like a mature project and has
> >     an MIT
> >     > license, which I think is a license compatible with Numpy.
> >
> >     We should not support locales. The string representations of these
> >     elements should be Python-parseable.
> >
> >     > I'm inclined to just fix the locale and ignore the rest until
> >     Python gets
> >     > things sorted out. But I'm lazy...
> >
> >     What do you think Python doesn't have sorted out?
> >
> >
> > Consistency between versions and platforms. David's note with the
> > ticket points to a Python 3.0 bug on this reported about, oh, two
> > years ago.
>
> As an example: in python 2.6, they solved some issues like inf/nan  by
> interpreting the strings in python before outputting them, but we do not
> use their fix. So we have:
>
> python -c "import numpy as np; print np.log(0)" ->  -inf (python 2.6) /
> -1.#INF (2.5, which is the format from the MS runtime).
>
> But:
>
> python -c "import numpy as np; print np.log(0).astype(np.float32)" ->
> -1.#INF (both 2.6 and 2.5)
>
> Etc... We can't be consistent with ourselves and with python at the same
> time, I think. I don't know which one is best: numpy being consistent
> through platforms and python versions, or being consistent with python.
>
> > There is also the problem of long doubles on the windows platform,
> > which isn't Python specific since Python doesn't use long doubles. As
> > I understand long doubles on windows, mingw32 supports them, VS
> > doesn't, so there is a compiler inconsistency to deal with also.
>
> To be exact, both mingw and VS support long double sensu stricto: the
> long double type is available. But sizeof(long double) == sizeof(double)
> with VS toolchain, and sizeof(long double) is 12 with mingw. The later
> is a pain, because mingw use both MS runtime (printf) and its own
> function (some math funcs), so we can't easily be consistent (either 8
> or 12 bytes long double) with mingw. One solution would be to use the
> mingwex printf (a printf reimplementation available  on recent mingwrt)
> instead of MSVC runtime - I would hope that this one is fixed wrt long
> double. This problem is even worse on 64 bits (long double are 16 bytes
> by default there with mingw).
>

I think there are also less visible problems with string to number
conversions, so that might be a reason to consider third party software.
Python doesn't directly support conversion of complex numbers presented as
strings, for instance, although that may have been fixed in 3.0. So
extending some third party sscanf might be useful.

The question comes of how much time you want to spend on this. I know
working on a dissertation is a great excuse to do something else; I spent
some weeks writing my own latex dissertation class, for instance. But I
don't know if that is recommended practice.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://projects.scipy.org/pipermail/numpy-discussion/attachments/20081228/b141fa42/attachment.html 


More information about the Numpy-discussion mailing list