[Numpy-discussion] Numpy array performance issue
Bruno Santos
bacmsantos@gmail....
Wed Feb 24 11:59:28 CST 2010
2010/2/24 Chris Colbert <sccolbert@gmail.com>
> In [4]: %timeit a = np.random.randint(0, 20, 100)
> 100000 loops, best of 3: 4.32 us per loop
>
> In [5]: %timeit (a>=10).sum()
> 100000 loops, best of 3: 7.32 us per loop
>
> In [8]: %timeit np.where(a>=10)
> 100000 loops, best of 3: 5.36 us per loop
>
>
> am i missing something?
>
I guess you are.
In [23]: a = np.random.randint(0, 20, 1000)
In [24]: %timeit np.where(a>=10)
10000 loops, best of 3: 22.4 us per loop
In [25]: %timeit (a>=10).sum()
100000 loops, best of 3: 11.7 us per loop
np.random.where doesn't scale very well.
>
> On Wed, Feb 24, 2010 at 12:50 PM, Bruno Santos <bacmsantos@gmail.com>wrote:
>
>> In both versions your lsPhasedValues contains the number of positions in
>> the array that match a certain criteria. What I need in that step is the
>> unique values and not their positions.
>>
>> 2010/2/24 Robert Kern <robert.kern@gmail.com>
>>
>>> On Wed, Feb 24, 2010 at 11:19, Bruno Santos <bacmsantos@gmail.com>
>>> wrote:
>>>
>>> > It seems that the python 2.6.4 has a more efficient implementation of
>>> the
>>> > lists. It runs faster on this version and slower on 2.5.4 on the same
>>> > machine with debian. A lot faster in fact.
>>> > I was trying to change my headche for the last couple of weeks. But you
>>> > migth give me a lot more optimizations that I can pick. I am trying to
>>> > optimize the following function
>>> > def hypergeometric(self,lindex,rindex):
>>> > """
>>> > loc.hypergeometric(lindex,rindex)
>>> > Performs the hypergeometric test for the loci between lindex
>>> and
>>> > rindex.
>>> > Returns the minimum p-Value
>>> > """
>>> > aASense = self.aASCounts[lindex*nSize:(rindex+1)*nSize]
>>> > #Create the subarray to test
>>> > aLoci =
>>> >
>>> numpy.hstack([self.aSCounts[lindex*nSize:(rindex+1)*nSize],aASense[::-1]])
>>> > #Get the values to test
>>> > length = len(aLoci)
>>> > lsPhasedValues = set([aLoci[i] for i in xrange(length) if
>>> i%nSize==0
>>> > and aLoci[i]>0])
>>> > m = length/nSize
>>> > n = (length-1)-(length/nSize-1)
>>> > #Create an array to store the Pvalues
>>> > lsPvalues = []
>>> > append = lsPvalues.append
>>> > #Calculate matches in Phased and non Phased position
>>> > for r in lsPhasedValues:
>>> > #Initiate number of matches to 0
>>> > q = sum([1 for j in xrange(length) if j%nSize==0 and
>>> > aLoci[j]>=r])
>>> > k = sum([1 for j in xrange(length) if aLoci[j]>=r])
>>> > key = '%i,%i,%i,%i'%(q-1,m,n,k)
>>> > try:append(dtPhyper[key])
>>> > except KeyError:
>>> > value = self.lphyper(q-1, m, n, k)
>>> > append(value)
>>> > dtPhyper[key]=value
>>> > return min(lsPvalues)
>>> > Is there any efficient way to test the array simultaneous for two
>>> different
>>> > conditions?
>>>
>>> j = np.arange(length)
>>> j_nSize_mask = ((j % nSize) == 0)
>>> lsPhasedValues = (j_nSize_mask & (aLoci >= 0)).sum()
>>> ...
>>> bigALoci = (aLoci >= r)
>>> q = (j_nSize_mask & bigALoci).sum()
>>> k = bigALoci.sum()
>>>
>>>
>>> Another way to do it:
>>>
>>> j_nSize = np.arange(0, length, nSize)
>>> lsPhasedValues = (aLoci[j_nSize] >= 0).sum()
>>> ...
>>> q = (aLoci[j_nSize] >= r).sum()
>>> k = (aLoci >= r).sum()
>>>
>>>
>>> --
>>> Robert Kern
>>>
>>> "I have come to believe that the whole world is an enigma, a harmless
>>> enigma that is made terrible by our own mad attempt to interpret it as
>>> though it had an underlying truth."
>>> -- Umberto Eco
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20100224/d2dd99bf/attachment.html
More information about the NumPy-Discussion
mailing list