[Numpy-discussion] Numpy array performance issue

Bruno Santos bacmsantos@gmail....
Wed Feb 24 11:50:56 CST 2010


In both versions your lsPhasedValues contains the number of positions in the
array that match a certain criteria. What I need in that step is the unique
values and not their positions.

2010/2/24 Robert Kern <robert.kern@gmail.com>

> On Wed, Feb 24, 2010 at 11:19, Bruno Santos <bacmsantos@gmail.com> wrote:
> > It seems that the python 2.6.4 has a more efficient implementation of the
> > lists. It runs faster on this version and slower on 2.5.4 on the same
> > machine with debian. A lot faster in fact.
> > I was trying to change my headche for the last couple of weeks. But you
> > migth give me a lot more optimizations that I can pick. I am trying to
> > optimize the following function
> > def hypergeometric(self,lindex,rindex):
> >         """
> >         loc.hypergeometric(lindex,rindex)
> >         Performs the hypergeometric test for the loci between lindex and
> > rindex.
> >         Returns the minimum p-Value
> >         """
> >         aASense = self.aASCounts[lindex*nSize:(rindex+1)*nSize]
> >         #Create the subarray to test
> >         aLoci =
> >
> numpy.hstack([self.aSCounts[lindex*nSize:(rindex+1)*nSize],aASense[::-1]])
> >         #Get the values to test
> >         length = len(aLoci)
> >         lsPhasedValues = set([aLoci[i] for i in xrange(length) if
> i%nSize==0
> > and aLoci[i]>0])
> >         m = length/nSize
> >         n = (length-1)-(length/nSize-1)
> >         #Create an array to store the Pvalues
> >         lsPvalues = []
> >         append = lsPvalues.append
> >         #Calculate matches in Phased and non Phased position
> >         for r in lsPhasedValues:
> >             #Initiate number of matches to 0
> >             q = sum([1 for j in xrange(length) if j%nSize==0 and
> > aLoci[j]>=r])
> >             k = sum([1 for j in xrange(length) if aLoci[j]>=r])
> >             key = '%i,%i,%i,%i'%(q-1,m,n,k)
> >             try:append(dtPhyper[key])
> >             except KeyError:
> >                 value = self.lphyper(q-1, m, n, k)
> >                 append(value)
> >                 dtPhyper[key]=value
> >         return min(lsPvalues)
> > Is there any efficient way to test the array simultaneous for two
> different
> > conditions?
>
> j = np.arange(length)
> j_nSize_mask = ((j % nSize) == 0)
> lsPhasedValues = (j_nSize_mask & (aLoci >= 0)).sum()
> ...
>    bigALoci = (aLoci >= r)
>    q = (j_nSize_mask & bigALoci).sum()
>    k = bigALoci.sum()
>
>
> Another way to do it:
>
> j_nSize = np.arange(0, length, nSize)
> lsPhasedValues = (aLoci[j_nSize] >= 0).sum()
> ...
>    q = (aLoci[j_nSize] >= r).sum()
>    k = (aLoci >= r).sum()
>
>
> --
> Robert Kern
>
> "I have come to believe that the whole world is an enigma, a harmless
> enigma that is made terrible by our own mad attempt to interpret it as
> though it had an underlying truth."
>  -- Umberto Eco
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20100224/744b358b/attachment.html 


More information about the NumPy-Discussion mailing list