[Numpy-discussion] 2D binning

Zachary Pincus zachary.pincus@yale....
Tue Jun 1 20:57:35 CDT 2010


> I guess it's as fast as I'm going to get. I don't really see any  
> other way. BTW, the lat/lons are integers)

You could (in c or cython) try a brain-dead "hashtable" with no  
collision detection:

for lat, long, data in dataset:
   bin = (lat ^ long) % num_bins
   hashtable[bin] = update_incremental_mean(hashtable[bin], data)

you'll of course want to do some experiments to see if your data are  
sufficiently sparse and/or you can afford a large enough hashtable  
array that you won't get spurious hash collisions. Adding error- 
checking to ensure that there are no collisions would be pretty  
trivial (just keep a table of the lat/long for each hash value, which  
you'll need anyway, and check that different lat/long pairs don't get  
assigned the same bin).

Zach



> -Mathew
>
> On Tue, Jun 1, 2010 at 1:49 PM, Zachary Pincus <zachary.pincus@yale.edu 
> > wrote:
> > Hi
> > Can anyone think of a clever (non-lopping) solution to the  
> following?
> >
> > A have a list of latitudes, a list of longitudes, and list of data
> > values. All lists are the same length.
> >
> > I want to compute an average  of data values for each lat/lon pair.
> > e.g. if lat[1001] lon[1001] = lat[2001] [lon [2001] then
> > data[1001] = (data[1001] + data[2001])/2
> >
> > Looping is going to take wayyyy to long.
>
> As a start, are the "equal" lat/lon pairs exactly equal (i.e. either
> not floating-point, or floats that will always compare equal, that is,
> the floating-point bit-patterns will be guaranteed to be identical) or
> approximately equal to float tolerance?
>
> If you're in the approx-equal case, then look at the KD-tree in scipy
> for doing near-neighbors queries.
>
> If you're in the exact-equal case, you could consider hashing the lat/
> lon pairs or something. At least then the looping is O(N) and not
> O(N^2):
>
> import collections
> grouped = collections.defaultdict(list)
> for lt, ln, da in zip(lat, lon, data):
>   grouped[(lt, ln)].append(da)
>
> averaged = dict((ltln, numpy.mean(da)) for ltln, da in  
> grouped.items())
>
> Is that fast enough?
>
> Zach
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion



More information about the NumPy-Discussion mailing list