[Numpy-discussion] 2D binning

Benjamin Root ben.root@ou....
Wed Jun 2 09:42:19 CDT 2010


Why not simply use a set?

uniquePoints = set(zip(lats, lons))

Ben Root

On Wed, Jun 2, 2010 at 2:41 AM, Vincent Schut <schut@sarvision.nl> wrote:

> On 06/02/2010 04:52 AM, josef.pktd@gmail.com wrote:
> > On Tue, Jun 1, 2010 at 9:57 PM, Zachary Pincus<zachary.pincus@yale.edu>
>  wrote:
> >>> I guess it's as fast as I'm going to get. I don't really see any
> >>> other way. BTW, the lat/lons are integers)
> >>
> >> You could (in c or cython) try a brain-dead "hashtable" with no
> >> collision detection:
> >>
> >> for lat, long, data in dataset:
> >>    bin = (lat ^ long) % num_bins
> >>    hashtable[bin] = update_incremental_mean(hashtable[bin], data)
> >>
> >> you'll of course want to do some experiments to see if your data are
> >> sufficiently sparse and/or you can afford a large enough hashtable
> >> array that you won't get spurious hash collisions. Adding error-
> >> checking to ensure that there are no collisions would be pretty
> >> trivial (just keep a table of the lat/long for each hash value, which
> >> you'll need anyway, and check that different lat/long pairs don't get
> >> assigned the same bin).
> >>
> >> Zach
> >>
> >>
> >>
> >>> -Mathew
> >>>
> >>> On Tue, Jun 1, 2010 at 1:49 PM, Zachary Pincus<zachary.pincus@yale.edu
> >>>> wrote:
> >>>> Hi
> >>>> Can anyone think of a clever (non-lopping) solution to the
> >>> following?
> >>>>
> >>>> A have a list of latitudes, a list of longitudes, and list of data
> >>>> values. All lists are the same length.
> >>>>
> >>>> I want to compute an average  of data values for each lat/lon pair.
> >>>> e.g. if lat[1001] lon[1001] = lat[2001] [lon [2001] then
> >>>> data[1001] = (data[1001] + data[2001])/2
> >>>>
> >>>> Looping is going to take wayyyy to long.
> >>>
> >>> As a start, are the "equal" lat/lon pairs exactly equal (i.e. either
> >>> not floating-point, or floats that will always compare equal, that is,
> >>> the floating-point bit-patterns will be guaranteed to be identical) or
> >>> approximately equal to float tolerance?
> >>>
> >>> If you're in the approx-equal case, then look at the KD-tree in scipy
> >>> for doing near-neighbors queries.
> >>>
> >>> If you're in the exact-equal case, you could consider hashing the lat/
> >>> lon pairs or something. At least then the looping is O(N) and not
> >>> O(N^2):
> >>>
> >>> import collections
> >>> grouped = collections.defaultdict(list)
> >>> for lt, ln, da in zip(lat, lon, data):
> >>>    grouped[(lt, ln)].append(da)
> >>>
> >>> averaged = dict((ltln, numpy.mean(da)) for ltln, da in
> >>> grouped.items())
> >>>
> >>> Is that fast enough?
> >
> > If the lat lon can be converted to a 1d label as Wes suggested, then
> > in a similar timing exercise ndimage was the fastest.
> > http://mail.scipy.org/pipermail/scipy-user/2009-February/019850.html
>
> And as you said your lats and lons are integers, you could simply do
>
> ll = lat*1000 + lon
>
> to get unique 'hashes' or '1d labels' for you latlon pairs, as a lat or
> lon will never exceed 360 (degrees).
>
> After that, either use the ndimage approach, or you could use
> histogramming with weighting by data values and divide by histogram
> withouth weighting, or just loop.
>
> Vincent
>
> >
> > (this was for python 2.4, also later I found np.bincount which
> > requires that the labels are consecutive integers, but is as fast as
> > ndimage)
> >
> > I don't know how it would compare to the new suggestions.
> >
> > Josef
> >
> >
> >
> >>>
> >>> Zach
> >>> _______________________________________________
> >>> NumPy-Discussion mailing list
> >>> NumPy-Discussion@scipy.org
> >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >>>
> >>> _______________________________________________
> >>> NumPy-Discussion mailing list
> >>> NumPy-Discussion@scipy.org
> >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >>
> >> _______________________________________________
> >> NumPy-Discussion mailing list
> >> NumPy-Discussion@scipy.org
> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20100602/341c8d23/attachment-0001.html 


More information about the NumPy-Discussion mailing list