[Numpy-discussion] efficient 3d histogram creation
josef.pktd@gmai...
josef.pktd@gmai...
Mon May 4 15:18:11 CDT 2009
On Mon, May 4, 2009 at 4:00 PM, Chris Colbert <sccolbert@gmail.com> wrote:
> i'll take a look at them over the next few days and see what i can hack out.
>
> Chris
>
> On Mon, May 4, 2009 at 3:18 PM, David Huard <david.huard@gmail.com> wrote:
>>
>>
>> On Mon, May 4, 2009 at 7:00 AM, <josef.pktd@gmail.com> wrote:
>>>
>>> On Mon, May 4, 2009 at 12:31 AM, Chris Colbert <sccolbert@gmail.com>
>>> wrote:
>>> > this actually sort of worked. Thanks for putting me on the right track.
>>> >
>>> > Here is what I ended up with.
>>> >
>>> > this is what I ended up with:
>>> >
>>> > def hist3d(imgarray):
>>> > histarray = N.zeros((16, 16, 16))
>>> > temp = imgarray.copy()
>>> > bins = N.arange(0, 257, 16)
>>> > histarray = N.histogramdd((temp[:,:,0].ravel(),
>>> > temp[:,:,1].ravel(),
>>> > temp[:,:,2].ravel()), bins=(bins, bins, bins))[0]
>>> > return histarray
>>> >
>>> > this creates a 3d histogram of rgb image values in the range 0,255
>>> > using 16
>>> > bins per component color.
>>> >
>>> > on a 640x480 image, it executes in 0.3 seconds vs 4.5 seconds for a for
>>> > loop.
>>> >
>>> > not quite framerate, but good enough for prototyping.
>>> >
>>>
>>> I don't think your copy to temp is necessary, and use reshape(-1,3) as
>>> in the example of Stefan, which will avoid copying the array 3 times.
>>>
>>> If you need to gain some more speed, then rewriting histogramdd and
>>> removing some of the unnecessary checks and calculations looks
>>> possible.
>>
>> Indeed, the strategy used in the histogram function is faster than the one
>> used in the histogramdd case, so porting one to the other should speed
>> things up.
>>
>> David
is searchsorted faster than digitize and bincount ?
Using the idea of histogramdd, I get a bit below a tenth of a second,
my best for this problem is below.
I was trying for a while what the fastest way is to convert a two
dimensional array into a one dimensional index for bincount. I found
that using the return index of unique1d is very slow compared to
numeric index calculation.
Josef
example timed for:
nobs = 307200
nbins = 16
factors = np.random.randint(256,size=(nobs,3)).copy()
factors2 = factors.reshape(-1,480,3).copy()
def hist3(factorsin, nbins):
if factorsin.ndim != 2:
factors = factorsin.reshape(-1,factorsin.shape[-1])
else:
factors = factorsin
N, D = factors.shape
darr = np.empty(factors.T.shape, dtype=int)
nele = np.max(factors)+1
bins = np.arange(0, nele, nele/nbins)
bins[-1] += 1
for i in range(D):
darr[i] = np.digitize(factors[:,i],bins) - 1
#add weighted rows
darrind = darr[D-1]
for i in range(D-1):
darrind += darr[i]*nbins**(D-i-1)
return np.bincount(darrind) # return flat not reshaped
More information about the Numpy-discussion
mailing list