[Numpy-discussion] 2D binning
Tue Jun 1 15:51:32 CDT 2010
On Tue, Jun 1, 2010 at 4:49 PM, Zachary Pincus <email@example.com> wrote:
>> Can anyone think of a clever (non-lopping) solution to the following?
>> A have a list of latitudes, a list of longitudes, and list of data
>> values. All lists are the same length.
>> I want to compute an average of data values for each lat/lon pair.
>> e.g. if lat lon = lat [lon  then
>> data = (data + data)/2
>> Looping is going to take wayyyy to long.
> As a start, are the "equal" lat/lon pairs exactly equal (i.e. either
> not floating-point, or floats that will always compare equal, that is,
> the floating-point bit-patterns will be guaranteed to be identical) or
> approximately equal to float tolerance?
> If you're in the approx-equal case, then look at the KD-tree in scipy
> for doing near-neighbors queries.
> If you're in the exact-equal case, you could consider hashing the lat/
> lon pairs or something. At least then the looping is O(N) and not
> import collections
> grouped = collections.defaultdict(list)
> for lt, ln, da in zip(lat, lon, data):
> grouped[(lt, ln)].append(da)
> averaged = dict((ltln, numpy.mean(da)) for ltln, da in grouped.items())
> Is that fast enough?
> NumPy-Discussion mailing list
This is a pretty good example of the "group-by" problem that will
hopefully work its way into a future edition of NumPy. Given that, a
good approach would be to produce a unique key from the lat and lon
vectors, and pass that off to the groupby routine (when it exists).
More information about the NumPy-Discussion