[Numpy-discussion] Numpy array performance issue
Bruno Santos
bacmsantos@gmail....
Wed Feb 24 11:50:56 CST 2010
In both versions your lsPhasedValues contains the number of positions in the
array that match a certain criteria. What I need in that step is the unique
values and not their positions.
2010/2/24 Robert Kern <robert.kern@gmail.com>
> On Wed, Feb 24, 2010 at 11:19, Bruno Santos <bacmsantos@gmail.com> wrote:
> > It seems that the python 2.6.4 has a more efficient implementation of the
> > lists. It runs faster on this version and slower on 2.5.4 on the same
> > machine with debian. A lot faster in fact.
> > I was trying to change my headche for the last couple of weeks. But you
> > migth give me a lot more optimizations that I can pick. I am trying to
> > optimize the following function
> > def hypergeometric(self,lindex,rindex):
> > """
> > loc.hypergeometric(lindex,rindex)
> > Performs the hypergeometric test for the loci between lindex and
> > rindex.
> > Returns the minimum p-Value
> > """
> > aASense = self.aASCounts[lindex*nSize:(rindex+1)*nSize]
> > #Create the subarray to test
> > aLoci =
> >
> numpy.hstack([self.aSCounts[lindex*nSize:(rindex+1)*nSize],aASense[::-1]])
> > #Get the values to test
> > length = len(aLoci)
> > lsPhasedValues = set([aLoci[i] for i in xrange(length) if
> i%nSize==0
> > and aLoci[i]>0])
> > m = length/nSize
> > n = (length-1)-(length/nSize-1)
> > #Create an array to store the Pvalues
> > lsPvalues = []
> > append = lsPvalues.append
> > #Calculate matches in Phased and non Phased position
> > for r in lsPhasedValues:
> > #Initiate number of matches to 0
> > q = sum([1 for j in xrange(length) if j%nSize==0 and
> > aLoci[j]>=r])
> > k = sum([1 for j in xrange(length) if aLoci[j]>=r])
> > key = '%i,%i,%i,%i'%(q-1,m,n,k)
> > try:append(dtPhyper[key])
> > except KeyError:
> > value = self.lphyper(q-1, m, n, k)
> > append(value)
> > dtPhyper[key]=value
> > return min(lsPvalues)
> > Is there any efficient way to test the array simultaneous for two
> different
> > conditions?
>
> j = np.arange(length)
> j_nSize_mask = ((j % nSize) == 0)
> lsPhasedValues = (j_nSize_mask & (aLoci >= 0)).sum()
> ...
> bigALoci = (aLoci >= r)
> q = (j_nSize_mask & bigALoci).sum()
> k = bigALoci.sum()
>
>
> Another way to do it:
>
> j_nSize = np.arange(0, length, nSize)
> lsPhasedValues = (aLoci[j_nSize] >= 0).sum()
> ...
> q = (aLoci[j_nSize] >= r).sum()
> k = (aLoci >= r).sum()
>
>
> --
> Robert Kern
>
> "I have come to believe that the whole world is an enigma, a harmless
> enigma that is made terrible by our own mad attempt to interpret it as
> though it had an underlying truth."
> -- Umberto Eco
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20100224/744b358b/attachment.html
More information about the NumPy-Discussion
mailing list