[SciPy-user] Conditionally adding items to a list

Anne Archibald peridot.faceted@gmail....
Tue Aug 26 11:37:48 CDT 2008


2008/8/26 Roger Herikstad <roger.herikstad@gmail.com>:

>  I have a prolem that I was wondering if anyone could come up with a
> good solution for. I basically have to lists of number and I want to
> add elements from one list to the other as long as the difference
> between the added element and all elements already in that list
> exceeds a certain threshold. The code I came up with is
>
> map(times1.append,ifilter(lambda(x):
> numpy.abs(x-numpy.array(times1)).min()>1000, times2))
>
> but it quickly slows down if times2 becomes sufficiently long. I need
> to be able to do this with lists of 100,000++ elements. Does anyone
> know of a quicker way of doing this?

As others have said, you should think carefully about whether this is
what you actually want: the result you get will depend on the order of
the incoming items:
[500,1000,1500] -> [500,1500]
[1000,500,1500] -> [1000]

But if it is what you want, I would worry more about the fact that
your algorithm is O(n**2) than about the fact that list operations are
(supposedly) slow. Here's an O(n) way to do what you want:

def trim(input,spacing=1000):
    r = {}
    for n in input:
        i = math.floor(n/float(spacing))
        if i in r:
            continue
        if i-1 in r and n-r[i-1]<spacing:
            continue
        if i+1 in r and r[i+1]-n<spacing:
            continue
        r[i] = n
    return [r[i] for i in r]

If you want to make an array without going through the list generated
on return, you could also write "return np.fromiter(r[i] for i in r)",
provided  your python understands iterator comprehensions.

You could also sort the input list; then you only need to compare each
element to the previous one. This isn't even order-dependent anymore,
but it doesn't produce the result you asked for:

def trim2(input, spacing=1000):
    input = sorted(input)
    r = [input[0]]
    for i in input[1:]:
        if i-r[-1]>=spacing:
            r.append[i]
    return r

If there are many elements to be discarded, this may be faster:

def trim3_gen(input, spacing=1000):
    input = np.sort(input)
    i = input[0]
    while True:
        yield i
        try:
            i = input[np.searchsorted(input,i+spacing)]
        except IndexError:
            break

(note that unlike the others it's a generator, for no really compelling reason.)

Good luck,
Anne


More information about the SciPy-user mailing list