[Numpy-discussion] Numpy array performance issue

Robert Kern robert.kern@gmail....
Wed Feb 24 11:26:39 CST 2010


On Wed, Feb 24, 2010 at 11:19, Bruno Santos <bacmsantos@gmail.com> wrote:
> It seems that the python 2.6.4 has a more efficient implementation of the
> lists. It runs faster on this version and slower on 2.5.4 on the same
> machine with debian. A lot faster in fact.
> I was trying to change my headche for the last couple of weeks. But you
> migth give me a lot more optimizations that I can pick. I am trying to
> optimize the following function
> def hypergeometric(self,lindex,rindex):
>         """
>         loc.hypergeometric(lindex,rindex)
>         Performs the hypergeometric test for the loci between lindex and
> rindex.
>         Returns the minimum p-Value
>         """
>         aASense = self.aASCounts[lindex*nSize:(rindex+1)*nSize]
>         #Create the subarray to test
>         aLoci =
> numpy.hstack([self.aSCounts[lindex*nSize:(rindex+1)*nSize],aASense[::-1]])
>         #Get the values to test
>         length = len(aLoci)
>         lsPhasedValues = set([aLoci[i] for i in xrange(length) if i%nSize==0
> and aLoci[i]>0])
>         m = length/nSize
>         n = (length-1)-(length/nSize-1)
>         #Create an array to store the Pvalues
>         lsPvalues = []
>         append = lsPvalues.append
>         #Calculate matches in Phased and non Phased position
>         for r in lsPhasedValues:
>             #Initiate number of matches to 0
>             q = sum([1 for j in xrange(length) if j%nSize==0 and
> aLoci[j]>=r])
>             k = sum([1 for j in xrange(length) if aLoci[j]>=r])
>             key = '%i,%i,%i,%i'%(q-1,m,n,k)
>             try:append(dtPhyper[key])
>             except KeyError:
>                 value = self.lphyper(q-1, m, n, k)
>                 append(value)
>                 dtPhyper[key]=value
>         return min(lsPvalues)
> Is there any efficient way to test the array simultaneous for two different
> conditions?

j = np.arange(length)
j_nSize_mask = ((j % nSize) == 0)
lsPhasedValues = (j_nSize_mask & (aLoci >= 0)).sum()
...
    bigALoci = (aLoci >= r)
    q = (j_nSize_mask & bigALoci).sum()
    k = bigALoci.sum()


Another way to do it:

j_nSize = np.arange(0, length, nSize)
lsPhasedValues = (aLoci[j_nSize] >= 0).sum()
...
    q = (aLoci[j_nSize] >= r).sum()
    k = (aLoci >= r).sum()


-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


More information about the NumPy-Discussion mailing list