[Numpy-discussion] Complex slicing and take
Keith Goodman
kwgoodman@gmail....
Wed Dec 30 11:19:48 CST 2009
On Wed, Dec 30, 2009 at 12:08 AM, Eric Emsellem <eemselle@eso.org> wrote:
> Hi
>
> thanks for the tips. Unfortunately this is not what I am after.
>
>>> > ? import numpy as num
>>> > ? startarray = random((1000,100))
>>> > ? take_sample = [1,2,5,6,1,2]
>>> > ? temp = num.take(startarray,take_sample,axis=1)
>>
>> Would it help to make temp a 1000x4 array instead of 1000x6? Could you
>> do that by changing take_sample to [1,2,5,6] and multiplying columns 1
>> and 2 by a factor of 2? That would slow down the construction of temp
>> but speed up the addition (and slicing?) in the loop below.
>
> No it wouldn't help unfortunately, because the second instance of "1,2"
> would have different shifts. So I cannot just count the number of occurrence
> of each line.
>
> From the initial 2D array, 1D lines could be extracted several times, with
> each time a different shift.
>
>>> > ? shift = [10,20,34,-10,22,-20]
>>> > ? result = num.zeros(900) ?# shorter than initial because of the shift
>>> > ? for i in range(len(shift)) :
>>> > ? ? ?result += temp[100+shift[i]:-100+shift[1]]
>>
>> This looks fast to me. The slicing doesn't make a copy nor does the
>> addition. I've read that cython does fast indexing but I don't know if
>> that applies to slicing as well. I assume that shift[1] is a typo and
>> should be shift[i].
>
> (yes of course the shift[1] should be shift[i])
> Well this may be fast, but not fast enough. And also, starting from my 2D
> startarray again, it looks odd that I cannot do something like:
>
> startarray = random((1000,100))
> take_sample = [1,2,5,6,1,2]
> shift = [10,20,34,-10,22,-20]
> result =
> num.sum(num.take(startarray,take_sample,axis=1)[100+shift:100-shift])
>
> but of course this is nonsense because I cannot address the data this way
> (with "shift").
>
> In fact I realise now that my question is simpler: how do I extract and sum
> 1d lines from a 2D array if I want first each line to be "shifted". So
> starting again now, I want a quick way to write:
>
> startarray = random((1000,6))
> shift = [10,20,34,-10,22,-20]
> result = num.zeros(1000, dtype=float)
> for i in len(shift) :
> result += startarray[100+shift[i]:900+shift[i]]
>
>
> Can I write this directly with some numpy indexing without the loop in
> python?
>
> thanks for any tip.
>
> Eric
Where's the bottleneck? There's the loop, there's constructing the
indices (which could be done outside the loop), slicing, adding. The
location of the bottleneck probably depends on the relative sizes of
the arrays. If the bottleneck is the loop, i.e. shift has a LOT of
elements, then it might speed things up to break shift into chunks and
use python's multiprocessing module to solve this in parallel.
Something like cython would also speed up the loop.
I haven't tried running your code, but if anyone does, I think
result += startarray[100+shift[i]:900+shift[i]]
should be
result += startarray[100+shift[i]:900+shift[i], i]
More information about the NumPy-Discussion
mailing list