[Numpy-discussion] vectorizing

Keith Goodman kwgoodman@gmail....
Fri Jun 5 15:05:12 CDT 2009


On Fri, Jun 5, 2009 at 1:01 PM, Keith Goodman <kwgoodman@gmail.com> wrote:
> On Fri, Jun 5, 2009 at 12:53 PM,  <josef.pktd@gmail.com> wrote:
>> On Fri, Jun 5, 2009 at 2:07 PM, Brian Blais <bblais@bryant.edu> wrote:
>>> Hello,
>>> I have a vectorizing problem that I don't see an obvious way to solve.  What
>>> I have is a vector like:
>>> obs=array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2])
>>> and a matrix
>>> T=zeros((6,6))
>>> and what I want in T is a count of all of the transitions in obs, e.g.
>>> T[1,2]=3 because the sequence 1-2 happens 3 times,  T[3,4]=1 because the
>>> sequence 3-4 only happens once, etc...  I can do it unvectorized like:
>>> for o1,o2 in zip(obs[:-1],obs[1:]):
>>>     T[o1,o2]+=1
>>>
>>> which gives the correct answer from above, which is:
>>> array([[ 0.,  0.,  0.,  0.,  0.,  0.],
>>>        [ 0.,  0.,  3.,  0.,  0.,  1.],
>>>        [ 0.,  3.,  0.,  1.,  0.,  0.],
>>>        [ 0.,  0.,  2.,  0.,  1.,  0.],
>>>        [ 0.,  0.,  0.,  2.,  0.,  0.],
>>>        [ 0.,  0.,  0.,  0.,  1.,  0.]])
>>>
>>>
>>> but I thought there would be a better way.  I tried:
>>> o1=obs[:-1]
>>> o2=obs[1:]
>>> T[o1,o2]+=1
>>> but this doesn't give a count, it just yields 1's at the transition points,
>>> like:
>>> array([[ 0.,  0.,  0.,  0.,  0.,  0.],
>>>        [ 0.,  0.,  1.,  0.,  0.,  1.],
>>>        [ 0.,  1.,  0.,  1.,  0.,  0.],
>>>        [ 0.,  0.,  1.,  0.,  1.,  0.],
>>>        [ 0.,  0.,  0.,  1.,  0.,  0.],
>>>        [ 0.,  0.,  0.,  0.,  1.,  0.]])
>>>
>>> Is there a clever way to do this?  I could write a quick Cython solution,
>>> but I wanted to keep this as an all-numpy implementation if I can.
>>>
>>
>> histogram2d or its imitation, there was a discussion on histogram2d a
>> short time ago
>>
>>>>> obs=np.array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2])
>>>>> obs2 = obs - 1
>>>>> trans = np.hstack((0,np.bincount(obs2[:-1]*6+6+obs2[1:]),0)).reshape(6,6)
>>>>> re = np.array([[ 0.,  0.,  0.,  0.,  0.,  0.],
>> ...         [ 0.,  0.,  3.,  0.,  0.,  1.],
>> ...         [ 0.,  3.,  0.,  1.,  0.,  0.],
>> ...         [ 0.,  0.,  2.,  0.,  1.,  0.],
>> ...         [ 0.,  0.,  0.,  2.,  0.,  0.],
>> ...         [ 0.,  0.,  0.,  0.,  1.,  0.]])
>>>>> np.all(re == trans)
>> True
>>
>>>>> trans
>> array([[0, 0, 0, 0, 0, 0],
>>       [0, 0, 3, 0, 0, 1],
>>       [0, 3, 0, 1, 0, 0],
>>       [0, 0, 2, 0, 1, 0],
>>       [0, 0, 0, 2, 0, 0],
>>       [0, 0, 0, 0, 1, 0]])
>>
>>
>> or
>>
>>>>> h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, range=[[0,5],[0,5]])
>>>>> re
>> array([[ 0.,  0.,  0.,  0.,  0.,  0.],
>>       [ 0.,  0.,  3.,  0.,  0.,  1.],
>>       [ 0.,  3.,  0.,  1.,  0.,  0.],
>>       [ 0.,  0.,  2.,  0.,  1.,  0.],
>>       [ 0.,  0.,  0.,  2.,  0.,  0.],
>>       [ 0.,  0.,  0.,  0.,  1.,  0.]])
>>>>> h
>> array([[ 0.,  0.,  0.,  0.,  0.,  0.],
>>       [ 0.,  0.,  3.,  0.,  0.,  1.],
>>       [ 0.,  3.,  0.,  1.,  0.,  0.],
>>       [ 0.,  0.,  2.,  0.,  1.,  0.],
>>       [ 0.,  0.,  0.,  2.,  0.,  0.],
>>       [ 0.,  0.,  0.,  0.,  1.,  0.]])
>>
>>>>> np.all(re == h)
>> True
>
> There's no way my list method can beat that. But by adding
>
> import psyco
> psyco.full()
>
> I get a total speed up of a factor of 15 when obs is length 10000.

Actually, it is faster:

histogram:

>> h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, range=[[0,5],[0,5]])
>> timeit h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, range=[[0,5],[0,5]])
100 loops, best of 3: 4.14 ms per loop

lists:

>> timeit test(obs3, T3)
1000 loops, best of 3: 1.32 ms per loop


More information about the Numpy-discussion mailing list