[SciPy-User] reversing or avoiding the need to reverse argsort()

Vincent Davis vincent@vincentdavis....
Sat Mar 20 11:41:03 CDT 2010


@Anne Archibald
I get an error and not sure way, or I should say first I need to better
understand the indexing then I might be able to fix it.

In [3]: def quantile_normalization(anarray):
   ...:             """ anarray with samples in the columns and probes
across the rows"""
   ...:         A=anarray
   ...:         AA = np.zeros_like(A)
   ...:         I = np.argsort(A,axis=0)
   ...:         AA[I,np.arange(A.shape[1])] =
np.mean(A[I,np.arange(A.shape[1])],axis=1)
   ...:         return AA
   ...:

In [4]: x = np.array([[1,2,3,4],[4,3,2,1],[1,1,1,1]])

In [5]: quantile_normalization(x)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)

/Users/vmd/<ipython console> in <module>()

/Users/vmd/<ipython console> in quantile_normalization(anarray)

ValueError: array is not broadcastable to correct shape

In [6]:

  *Vincent Davis
720-301-3003 *
vincent@vincentdavis.net
 my blog <http://vincentdavis.net> |
LinkedIn<http://www.linkedin.com/in/vincentdavis>


On Sat, Mar 20, 2010 at 10:09 AM, Anne Archibald
<peridot.faceted@gmail.com>wrote:

>
>
> On 20 March 2010 11:37, Vincent Davis <vincent@vincentdavis.net> wrote:
>
>> What I want to do is simple but not sure the best way to go about it.
>>
>> I have an array a, shape(a) = rXc I need to sort each column
>> aargsort = a.argsort(axis=0)  # May use this later
>> aSort = a.sort(axis=0)
>>
>> now average each row
>> aSortRM = asort.mean(axis=1)
>>
>> now replace each col in a row with the row mean.
>> is there a better way than this
>>
>> aWithMeans = ones_like(a)
>> for ind in range(r)  # r = number of rows
>>     aWithMeans[ind]* aSortRM[ind]
>>
>> Now I need to undo the sort I did in the first step.
>> I think this is where I need the aargsort (argsort of a before any
>> modification) I am kinda stumped on this step. Everything I think of is
>> rather convoluted.
>> Is it possible to do this whole process without actually rearranging the
>> original array and just using the index info from the argsort() and would I
>> want to do this?
>>
>
> First of all, there's no need to use both argsort and sort:
>
> In [2]: A = np.random.randn(1000)
>
> In [3]: I = A.argsort()
>
> In [4]: B = A[I]
>
> In [5]: np.all(np.diff(B)>=0)
> Out[5]: True
>
> Now, to invert the permutation I, the easiest way is just to argsort() it;
> but this is N log N instead of order N, so there's another, slightly more
> cumbersome, way to do it:
>
> In [12]: Ij = np.zeros_like(I)
>
> In [13]: Ij[I] = np.arange(len(I))
>
> In [23]: AA = B[Ij]
>
> In [24]: np.all(AA==A)
> Out[24]: True
>
> Alternatively, you can avoid ever creating the inverse permutation:
>
> In [25]: AA[I] = B
>
> In [26]: np.all(AA==A)
> Out[26]: True
>
> All of this is somewhat more painful with two-dimensional arrays, since
> fancy and normal indexing do not mix well, and argsort returns something
> peuliar, but it can be made to work:
>
> In [51]: A = np.random.randn(100,100)
>
> In [52]: I = np.argsort(A, axis=0)
>
> In [54]: B=A[I,np.arange(A.shape[1])]
>
> In [55]: AA = np.zeros_like(A)
>
> In [56]: AA[I,np.arange(A.shape[1])] = B
>
> In [57]: np.all(AA==A)
> Out[57]: True
>
> Now for processing B in place, that's a simpler problem:
>
> In [58]: BB = np.zeros_like(B)
>
> In [59]: BB[...] = np.mean(B,axis=1)
>
> Here numpy's broadcasting rules take care of duplicating the entries of B
> as needed to fill out BB. In fact we can dispense with BB entirely:
>
> In [60]: AA[I,np.arange(A.shape[1])] = np.mean(B,axis=1)
>
> So your code can be rewritten as the impenetrable three lines:
>
> In [61]: AA = np.zeros_like(A)
>
> In [62]: I = np.argsort(A,axis=0)
>
> In [63]: AA[I,np.arange(A.shape[1])] =
> np.mean(A[I,np.arange(A.shape[1])],axis=1)
>
> Or, even less clearly, making I into what I think argsort should probably
> return:
>
> In [64]: I = np.argsort(A,axis=0), np.arange(A.shape[1])
>
> In [65]: AA[I] = np.mean(A[I],axis=1)
>
>
> Anne
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/scipy-user/attachments/20100320/6a565d17/attachment-0001.html 


More information about the SciPy-User mailing list