[Numpy-discussion] Complex slicing and take
josef.pktd@gmai...
josef.pktd@gmai...
Wed Dec 30 11:50:25 CST 2009
On Wed, Dec 30, 2009 at 12:19 PM, Keith Goodman <kwgoodman@gmail.com> wrote:
> On Wed, Dec 30, 2009 at 12:08 AM, Eric Emsellem <eemselle@eso.org> wrote:
>> Hi
>>
>> thanks for the tips. Unfortunately this is not what I am after.
>>
>>>> > ? import numpy as num
>>>> > ? startarray = random((1000,100))
>>>> > ? take_sample = [1,2,5,6,1,2]
>>>> > ? temp = num.take(startarray,take_sample,axis=1)
>>>
>>> Would it help to make temp a 1000x4 array instead of 1000x6? Could you
>>> do that by changing take_sample to [1,2,5,6] and multiplying columns 1
>>> and 2 by a factor of 2? That would slow down the construction of temp
>>> but speed up the addition (and slicing?) in the loop below.
>>
>> No it wouldn't help unfortunately, because the second instance of "1,2"
>> would have different shifts. So I cannot just count the number of occurrence
>> of each line.
>>
>> From the initial 2D array, 1D lines could be extracted several times, with
>> each time a different shift.
>>
>>>> > ? shift = [10,20,34,-10,22,-20]
>>>> > ? result = num.zeros(900) ?# shorter than initial because of the shift
>>>> > ? for i in range(len(shift)) :
>>>> > ? ? ?result += temp[100+shift[i]:-100+shift[1]]
>>>
>>> This looks fast to me. The slicing doesn't make a copy nor does the
>>> addition. I've read that cython does fast indexing but I don't know if
>>> that applies to slicing as well. I assume that shift[1] is a typo and
>>> should be shift[i].
>>
>> (yes of course the shift[1] should be shift[i])
>> Well this may be fast, but not fast enough. And also, starting from my 2D
>> startarray again, it looks odd that I cannot do something like:
>>
>> startarray = random((1000,100))
>> take_sample = [1,2,5,6,1,2]
>> shift = [10,20,34,-10,22,-20]
>> result =
>> num.sum(num.take(startarray,take_sample,axis=1)[100+shift:100-shift])
>>
>> but of course this is nonsense because I cannot address the data this way
>> (with "shift").
>>
>> In fact I realise now that my question is simpler: how do I extract and sum
>> 1d lines from a 2D array if I want first each line to be "shifted". So
>> starting again now, I want a quick way to write:
>>
>> startarray = random((1000,6))
>> shift = [10,20,34,-10,22,-20]
>> result = num.zeros(1000, dtype=float)
>> for i in len(shift) :
>> result += startarray[100+shift[i]:900+shift[i]]
>>
>>
>> Can I write this directly with some numpy indexing without the loop in
>> python?
>>
>> thanks for any tip.
>>
>> Eric
>
> Where's the bottleneck? There's the loop, there's constructing the
> indices (which could be done outside the loop), slicing, adding. The
> location of the bottleneck probably depends on the relative sizes of
> the arrays. If the bottleneck is the loop, i.e. shift has a LOT of
> elements, then it might speed things up to break shift into chunks and
> use python's multiprocessing module to solve this in parallel.
> Something like cython would also speed up the loop.
>
> I haven't tried running your code, but if anyone does, I think
>
> result += startarray[100+shift[i]:900+shift[i]]
>
> should be
>
> result += startarray[100+shift[i]:900+shift[i], i]
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
something like this ? just trying out, I haven't really checked
carefully whether it actually replicates your snippets
Constructing big intermediate arrays, might not improve performance
compared to a loop
>>> np.arange(30).reshape(6,5)
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29]])
>>> np.arange(30).reshape(6,5)[np.array([[1,2,2,1]]).T,np.arange(0,3)+np.array([[0,1,2,1]]).T]
array([[ 5, 6, 7],
[11, 12, 13],
[12, 13, 14],
[ 6, 7, 8]])
>>> np.arange(30).reshape(6,5)[np.array([[1,2,2,1]]).T,np.arange(0,3)+np.array([[0,1,2,1]]).T].sum(0)
array([34, 38, 42])
Josef
More information about the NumPy-Discussion
mailing list