[Numpy-discussion] Do we want scalar casting to behave as it does at the moment?

Chris Barker - NOAA Federal chris.barker@noaa....
Wed Jan 9 11:22:19 CST 2013

On Wed, Jan 9, 2013 at 7:09 AM, Nathaniel Smith <njs@pobox.com> wrote:
>> This is a general issue applying to data which is read from real-world
>> external sources.  For example, digitizers routinely represent their
>> samples as int8's or int16's, and you apply a scale and offset to get
>> a reading in volts.
> This particular case is actually handled fine by 1.5, because int
> array + float scalar *does* upcast to float. It's width that's ignored
> (int8 versus int32), not the basic "kind" of data (int versus float).
> But overall this does sound like a problem -- but it's not a problem
> with the scalar/array rules, it's a problem with working with narrow
> width data in general.

Exactly -- this is key. details asside, we essentially have a choice
between an approach that makes it easy to preserver your values --
upcasting liberally, or making it easy to preserve your dtype --
requiring users to specifically upcast where needed.

IIRC, our experience with earlier versions of numpy (and Numeric
before that) is that all too often folks would choose a small dtype
quite deliberately, then have it accidentally upcast for them -- this
was determined to be not-so-good behavior.

I think the HDF (and also netcdf...) case is a special case -- the
small dtype+scaling has been chosen deliberately by whoever created
the data file (to save space), but we would want it generally opaque
to the consumer of the file -- to me, that means the issue should be
adressed by the file reading tools, not numpy. If your HDF5 reader
chooses the the resulting dtype explicitly, it doesn't matter what
numpy's defaults are. If the user wants to work with the raw, unscaled
arrays, then they should know what they are doing.



Christopher Barker, Ph.D.

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception


More information about the NumPy-Discussion mailing list