Hi,
On Mon, Jun 4, 2012 at 12:44 AM, srean <srean.list@gmail.com> wrote:
> Hi Wolfgang,
> I think you are looking for reduceat( ), in particular add.reduceat()
Indeed OP could utilize add.reduceat(...), like:
# tst.py
import numpy as np
def reduce(data, lengths):
ind, ends= np.r_[lengths, lengths], lengths.cumsum()
ind[::2], ind[1::2]= ends- lengths, ends
return np.add.reduceat(np.r_[data, 0], ind)[::2]
def normalize(data, lengths):
return data/ np.repeat(reduce(data, lengths), lengths)
def gen(par):
lengths= np.random.randint(*par)
return np.random.randn(lengths.sum()), lengths
if __name__ == '__main__':
data= np.array([1, 2, 1, 2, 3, 4, 1, 2, 3], dtype= float)
lengths= np.array([2, 4, 3])
print reduce(data, lengths)
print normalize(data, lengths).round(2)
Resulting:
In []: %run tst
[ 3. 10. 6.]
[ 0.33 0.67 0.1 0.2 0.3 0.4 0.17 0.33 0.5 ]
Fast enough:
In []: data, lengths= gen([5, 15, 5e4])
In []: data.size
Out[]: 476028
In []: %timeit normalize(data, lengths)
10 loops, best of 3: 29.4 ms per loop
My 2 cents,
-eat
-- srean
> On Thu, May 31, 2012 at 12:36 AM, Wolfgang Kerzendorf
> <wkerzendorf@gmail.com> wrote:
> > Dear all,
> >
> > I have an ndarray which consists of many arrays stacked behind each
> other (only conceptually, in truth it's a normal 1d float64 array).
> > I have a second array which tells me the start of the individual data
> sets in the 1d float64 array and another one which tells me the length.
> > Example:
> >
> > data_array = (conceptually) [[1,2], [1,2,3,4], [1,2,3]] = in reality
> [1,2,1,2,3,4,1,2,3, dtype=float64]
> > start_pointer = [0, 2, 6]
> > length_data = [2, 4, 3]
> > I now want to normalize each of the individual data sets. I wrote a
> simple for loop over the start_pointer and length data grabbed the data and
> normalized it and wrote it back to the big array. That's slow. Is there an
> elegant numpy way to do that? Do I have to go the cython way?

