[SciPy-User] reversing or avoiding the need to reverse argsort()

Vincent Davis vincent@vincentdavis....
Sat Mar 20 12:08:08 CDT 2010


@Anne Archibald
Thanks for that long reply, The code worked and now I have a better
understanding of fancy indexing although I need to ready your reply several
 more times. Is fancy indexing documented with lots of examples somewhere.

Thanks Again

 *Vincent Davis
720-301-3003 *
vincent@vincentdavis.net
 my blog <http://vincentdavis.net> |
LinkedIn<http://www.linkedin.com/in/vincentdavis>


On Sat, Mar 20, 2010 at 10:59 AM, Anne Archibald
<peridot.faceted@gmail.com>wrote:

>
>
> On 20 March 2010 12:41, Vincent Davis <vincent@vincentdavis.net> wrote:
>
>> @Anne Archibald
>> I get an error and not sure way, or I should say first I need to better
>> understand the indexing then I might be able to fix it.
>>
> Ah, oops. I got the indexing wrong. numpy's broadcasting needs a little
> help:
>
> AA[I,np.arange(A.shape[1])] =
> np.mean(A[I,np.arange(A.shape[1])],axis=1)[:,np.newaxis]
>
> What's going on in this broadcasting is several things.
>
> First, when you argsort a high-dimensional array along one axis, you get an
> array the same shape as what you started with, containing the indices along
> that axis. But to use fancy indexing on a two-dimensional array, the way it
> works is:
>
> A[B, C]
>
> That is, B and C should be arrays the same shape as each other (but not
> necessarily the same shape as A). The result will have the same shape as B
> and C; what you get is (say B and C are three-dimensional):
>
> (A[B, C]) [i,j,k] = A[B[i,j,k], C[i,j,k]]
>
> So in your case, the argsorted array I is exactly what should go in the
> first place. What you really want is:
>
> B[i,j] = A[np.argsort(A,axis=0)[i,j], j]
>
> So you want to make up an array for the second index whose (i,j)th entry is
> just j. But I supplied np.arange(A.shape[1]); this is only a one-dimensional
> array. Rather than complain that the index arrays don't match up with each
> other, numpy tries adding a "fake" axis to the beginning, so that it looks
> like a two-dimensional array M that just ignores its first index:  M[i,j] =
> np.arange(A.shape[1])[j] =
> Which is what you want.
>
> All this was correct in the original email. What I got wrong was the mean:
> taking the mean across each row leaves you with a one-dimensional array,
> which we try to stuff into a two-dimensional array. So numpy tries its trick
> of adding an axis at the beginning, only to find out that that doesn't work.
> We needed to add an axis at the end, which numpy won't automatically do for
> us. So we do it by using np.newaxis.
>
> Anne
>
> In [3]: def quantile_normalization(anarray):
>>    ...:             """ anarray with samples in the columns and probes
>> across the rows"""
>>    ...:         A=anarray
>>    ...:         AA = np.zeros_like(A)
>>    ...:         I = np.argsort(A,axis=0)
>>    ...:         AA[I,np.arange(A.shape[1])] =
>> np.mean(A[I,np.arange(A.shape[1])],axis=1)
>>    ...:         return AA
>>    ...:
>>
>> In [4]: x = np.array([[1,2,3,4],[4,3,2,1],[1,1,1,1]])
>>
>> In [5]: quantile_normalization(x)
>>
>> ---------------------------------------------------------------------------
>> ValueError                                Traceback (most recent call
>> last)
>>
>> /Users/vmd/<ipython console> in <module>()
>>
>> /Users/vmd/<ipython console> in quantile_normalization(anarray)
>>
>> ValueError: array is not broadcastable to correct shape
>>
>> In [6]:
>>
>>   *Vincent Davis
>> 720-301-3003 *
>> vincent@vincentdavis.net
>>  my blog <http://vincentdavis.net> | LinkedIn<http://www.linkedin.com/in/vincentdavis>
>>
>>
>> On Sat, Mar 20, 2010 at 10:09 AM, Anne Archibald <
>> peridot.faceted@gmail.com> wrote:
>>
>>>
>>>
>>> On 20 March 2010 11:37, Vincent Davis <vincent@vincentdavis.net> wrote:
>>>
>>>> What I want to do is simple but not sure the best way to go about it.
>>>>
>>>> I have an array a, shape(a) = rXc I need to sort each column
>>>> aargsort = a.argsort(axis=0)  # May use this later
>>>> aSort = a.sort(axis=0)
>>>>
>>>> now average each row
>>>> aSortRM = asort.mean(axis=1)
>>>>
>>>> now replace each col in a row with the row mean.
>>>> is there a better way than this
>>>>
>>>> aWithMeans = ones_like(a)
>>>> for ind in range(r)  # r = number of rows
>>>>     aWithMeans[ind]* aSortRM[ind]
>>>>
>>>> Now I need to undo the sort I did in the first step.
>>>> I think this is where I need the aargsort (argsort of a before any
>>>> modification) I am kinda stumped on this step. Everything I think of is
>>>> rather convoluted.
>>>> Is it possible to do this whole process without actually rearranging the
>>>> original array and just using the index info from the argsort() and would I
>>>> want to do this?
>>>>
>>>
>>> First of all, there's no need to use both argsort and sort:
>>>
>>> In [2]: A = np.random.randn(1000)
>>>
>>> In [3]: I = A.argsort()
>>>
>>> In [4]: B = A[I]
>>>
>>> In [5]: np.all(np.diff(B)>=0)
>>> Out[5]: True
>>>
>>> Now, to invert the permutation I, the easiest way is just to argsort()
>>> it; but this is N log N instead of order N, so there's another, slightly
>>> more cumbersome, way to do it:
>>>
>>> In [12]: Ij = np.zeros_like(I)
>>>
>>> In [13]: Ij[I] = np.arange(len(I))
>>>
>>> In [23]: AA = B[Ij]
>>>
>>> In [24]: np.all(AA==A)
>>> Out[24]: True
>>>
>>> Alternatively, you can avoid ever creating the inverse permutation:
>>>
>>> In [25]: AA[I] = B
>>>
>>> In [26]: np.all(AA==A)
>>> Out[26]: True
>>>
>>> All of this is somewhat more painful with two-dimensional arrays, since
>>> fancy and normal indexing do not mix well, and argsort returns something
>>> peuliar, but it can be made to work:
>>>
>>> In [51]: A = np.random.randn(100,100)
>>>
>>> In [52]: I = np.argsort(A, axis=0)
>>>
>>> In [54]: B=A[I,np.arange(A.shape[1])]
>>>
>>> In [55]: AA = np.zeros_like(A)
>>>
>>> In [56]: AA[I,np.arange(A.shape[1])] = B
>>>
>>> In [57]: np.all(AA==A)
>>> Out[57]: True
>>>
>>> Now for processing B in place, that's a simpler problem:
>>>
>>> In [58]: BB = np.zeros_like(B)
>>>
>>> In [59]: BB[...] = np.mean(B,axis=1)
>>>
>>> Here numpy's broadcasting rules take care of duplicating the entries of B
>>> as needed to fill out BB. In fact we can dispense with BB entirely:
>>>
>>> In [60]: AA[I,np.arange(A.shape[1])] = np.mean(B,axis=1)
>>>
>>> So your code can be rewritten as the impenetrable three lines:
>>>
>>> In [61]: AA = np.zeros_like(A)
>>>
>>> In [62]: I = np.argsort(A,axis=0)
>>>
>>> In [63]: AA[I,np.arange(A.shape[1])] =
>>> np.mean(A[I,np.arange(A.shape[1])],axis=1)
>>>
>>> Or, even less clearly, making I into what I think argsort should probably
>>> return:
>>>
>>> In [64]: I = np.argsort(A,axis=0), np.arange(A.shape[1])
>>>
>>> In [65]: AA[I] = np.mean(A[I],axis=1)
>>>
>>>
>>> Anne
>>>
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User@scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>
>>>
>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User@scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/scipy-user/attachments/20100320/91346a41/attachment.html 


More information about the SciPy-User mailing list