[Numpy-discussion] In-place fancy selection

Francesc Altet faltet@carabos....
Thu Mar 1 14:44:19 CST 2007


El dj 01 de 03 del 2007 a les 13:26 -0700, en/na Charles R Harris va
escriure:
> 
> 
> On 3/1/07, Francesc Altet <faltet@carabos.com> wrote:
>         Hi,
>         
>         I don't think there is a solution for this, but perhaps
>         anybody may
>         offer some idea. Given:
>         
>         In [79]:a=numpy.arange(9,-1,-1)
>         In [80]:b=numpy.arange(10)
>         In [81]:numpy.random.shuffle(b)
>         In [82]:b 
>         Out[82]:array([2, 6, 3, 5, 4, 9, 0, 8, 7, 1])
>         In [83]:a=a[b]
>         In [84]:a
>         Out[84]:array([7, 3, 6, 4, 5, 0, 9, 1, 2, 8])
>         
>         is there a way to make the step 83 without having to keep 3
>         arrays
>         in-memory at the same time? This is, some way of doing fancy
>         indexing, 
>         but changing the elements *inplace*. The idea is to keep
>         memory
>         requeriments as low as possible when a and b are large arrays.
>         
>         Thanks!
> 
> I think that would be tough because of overlap between the two sides.
> The permutation could be factored into cycles which would mostly avoid
> that, but that is more theoretical than practical here. What is it you
> are trying to do? 

Yeah, the problem is the overlap. Well, what I'm trying to do is, given
two arrays on-disk (say, block and block_idx), sort one of them, and
then, re-order the other following with the same order than the first
one. My best approach until now is:

            block = tmp_sorted[nslice]   # read block from disk
            sblock_idx = block.argsort()
            block.sort()
	    # do things with block...
            del block  # get rid of block
            block_idx = tmp_indices[nslice]  # read bock_idx from disk
            indices[nslice] = block_idx[sblock_idx]

but the last line will take 3 times the memory that takes block_idx
alone. My goal would be that the algorithm above would take only 2 times
the memory of block_idx, but I don't think this is going to be possible.

Thanks!


-- 
Francesc Altet    |  Be careful about using the following code --
Carabos Coop. V.  |  I've only proven that it works, 
www.carabos.com   |  I haven't tested it. -- Donald Knuth



More information about the Numpy-discussion mailing list