# [Numpy-discussion] sampling based on running sums

John Hunter jdh2358@gmail....
Fri Jun 27 15:06:24 CDT 2008

```I would like to find the sample points where the running sum of some
vector exceeds some threshold -- at those points I want to collect all
the data in the vector since the last time the criteria was reached
and compute some stats on it.  For example, in python

tot = 0.
xs = []
ys = []

samples1 = []
for thisx, thisy in zip(x, y):
tot += thisx
xs.append(thisx)
ys.append(thisy)
if tot>=threshold:
samples1.append(func(xs,ys))
tot = 0.
xs = []
ys = []

The following is close in numpy

sx = np.cumsum(x)
n = (sx/threshold).astype(int)
ind = np.nonzero(np.diff(n)>0)[0]+1

lasti = 0
samples2 = []
for i in ind:
xs = x[lasti:i+1]
ys = y[lasti:i+1]
samples2.append(func(xs, ys))
lasti = i

But the sample points in ind do no guarantee that at least threshold
points are between the sample points due to truncation error.

What is a good numpy way to do this?

Thanks,
JDH
```

More information about the Numpy-discussion mailing list