[Numpy-discussion] 2d binning and linear regression

josef.pktd@gmai... josef.pktd@gmai...
Mon Jun 21 06:25:11 CDT 2010


On Sun, Jun 20, 2010 at 10:57 PM, Tom Durrant <thdurrant@gmail.com> wrote:
>
>>
>> are you doing something like np.polyfit(model, obs, 1) ?
>>
>> If you are using polyfit with deg=1, i.e. fitting a straight line,
>> then this could be also calculated using the weights in histogram2d.
>>
>> histogram2d (histogramdd) uses np.digitize and np.bincount, so I'm
>> surprised if the histogram2d version is much faster. If a quick
>> reading of histogramdd is correct, the main improvement would be to
>> get the labels "xy" out of it, so it can be used repeatedly with
>> np.bincount.
>>
>> Josef
>>
> Thanks Josef,
>
> >From my limited understanding, you are right the histogram is much faster due to
> the fact that it doesn't have to keep reading in the array over and over....
>
> I am using np.polyfit(model, obs, 1).  I couldn't work out a way to do these
> regression using histogram2d and weights, but you think it can be done?  This
> would be great!

the basic idea is in "polyfit  on multiple data points" on
numpy-disscusion mailing list April 2009

In this case, calculations have to be done by groups

subtract mean (this needs to be replaced by group demeaning)
modeldm = model - model.mean()
obsdm = obs - obs.mean()

xx = np.histogram2d(
xx, xedges, yedges = np.histogram2d(lat, lon, weights=modeldm*modeldm,
      bins=(latedges,lonedges))
xy, xedges, yedges = np.histogram2d(lat, lon, weights=obsdm*obsdm,
      bins=(latedges,lonedges))


slopes = xy/xx  # slopes by group

expand slopes to length of original array
predicted = model - obs * slopes_expanded
...

the main point is to get the group functions, for demeaning, ... for
the 2d labels (and get the labels out of histogramdd)

I'm out of time (off to the airport soon), but I can look into it next weekend.

Josef



> Cheers
> Tom
>
>
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


More information about the NumPy-Discussion mailing list