A reimplementation of MaskedArray

A. M. Archibald peridot.faceted at gmail.com
Fri Nov 10 00:42:38 CST 2006


On 09/11/06, Paul Dubois <pfdubois at gmail.com> wrote:

> Since the function of old retired persons is to tell youngsters stories
> around the campfile:

I'll pull up a log. But since I'm uppity, and not especially young, I
hope you won't mind if I heckle.

> A supercomputer hardware designer told me that when forced to add IEEE
> arithmetic to his designs that it decreased performance substantially, maybe
> 25-30%; it wasn't that doing the operations took so much longer, it was that
> the increased physical space needed for that circuitry pushed the memory
> farther away. Doubtless this inspired doing some of it in software instead.

The goal of IEEE floats is not to be faster but to make doing correct
numerical programming easier. (Numerical analysts can write robust
numerical code for almost any kind of float, but the vast majority of
numerical code is written by scientists who know the bare minimum or
less about numerical analysis, so a good system makes such code work
as well as possible.)

The other reason IEEE floats are good is, no disrespect to your
hardware designer friend intended, that they were designed with some
care. Reimplementations are liable to get some of the finicky details
wrong (round-to-even, denormalized numbers, what have you...). I found
Kahan's "How Java's Floating-point Hurts Everyone Everywhere" (
http://www.cs.berkeley.edu/~wkahan/JAVAhurt.pdf ) very educational
when it comes to "what can go wrong with platform-dependent
floating-point".

> No standard for controlling the behaviors exists, either, so you can find
> out the hard way that underflow-to-zero is being done in software by
> default, and that you are doing a lot of it. Or that your code doesn't have
> the same behavior on different platforms.

Well, if the platform is IEEE-compliant, you can trust it to behave
the same way logicially, even if some applications are slower on some
systems than others. Or did you mean that there's no standard way to
set the various IEEE flags? That seems like a language/compiler issue,
to me (which is not to minimize the pain it causes!)

> To my mind, all that was really accomplished was to convince said youngsters
> that somehow this NaN stuff was the solution to some problem. In reality,
> computing for 3 days and having it print out 1000 NaNs is not exactly
> satisfying. I think this stuff was essentially a mistake in the sense that
> it is a nice ivory-tower idea that costs more in practice than it is worth.
> I do not think a properly thought-out and coded algorithm ought to do
> anything that this stuff is supposed to 'help' with, and if it does do it,
> the code should stop executing.

Well, then turn on floating-point exceptions. I find a few NaNs in my
calculations relatively benign. If I'm doing a calculation where
they'll propagate and destroy everything, I trap them. But it happens
just as often that I launch a job, a NaN appears early on, but I come
back in the morning to find that instead of aborting, the job has
completed except for a few bad data points.

If you want an algorithm that takes advantage of NaNs, I have one. I
was simulating light rays around a black hole, generating a map of the
bending for various view angles. So I naturally did a lot of exploring
of grazing incidence, where the calculations would often yield NaNs.
So I used the NaNs to fill in the gap between the internal bending of
light and the external bending of the light; no extra programming
effort required, since that was what came out of the ray tracer
anyway. The rendering step just returned NaNs for the image pixels
that came from interpolating in the NaN-filled regions, which came out
black; just what I wanted. I just wrote the program ignoring what
would happen if a NaN appeared, and it came out right. Which is, I
think, the goal - to save the programmer having to think too much
about numerical-analysis headaches.

> Anyway, if I thought it would do the job I wouldn't have written MA in the
> first place.

It's certainly not always a good solution; in particular it's hard to
control the behaviour of things like where(). There's also no NaN for
integer types, which makes MaskedArray more useful.

Anyway, I am pleased that numpy has both controllable IEEE floats and
MaskedArray, and I apologize for a basically off-topic post.

A. M. Archibald

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642




More information about the Numpy-discussion mailing list