[Numpy-discussion] How to set array values based on a condition?

Francesc Altet faltet@carabos....
Sun Mar 23 10:20:46 CDT 2008


A Sunday 23 March 2008, Francesc Altet escrigué:
> A Sunday 23 March 2008, Anne Archibald escrigué:
> > On 23/03/2008, Damian Eads <eads@soe.ucsc.edu> wrote:
> > > Hi,
> > >
> > >  I am working on a memory-intensive experiment with very large
> > > arrays so I must be careful when allocating memory. Numpy already
> > > supports a number of in-place operations (+=, *=) making the task
> > > much more manageable. However, it is not obvious to me out I set
> > > values based on a very simple condition.
> > >
> > >  The expression
> > >
> > >    y[y<0]=-1
> > >
> > >  generates a binary index mask y>=0 of the same size as the array
> > > y, which is problematic when y is quite large.
> > >
> > >  I was wondering if there was anything like a set_where(A, cmp,
> > > B, setval, [optional elseval]) function where cmp would be a
> > > comparison operator expressed as a string.
> > >
> > >  The code below illustrates what I want to do. Admittedly, it
> > > needs to be cleaned up but it's a proof of concept. Does numpy
> > > provide any functions that support the functionality of the code
> > > below?
> >
> > That's a good question, but I'm pretty sure it doesn't, apart from
> > numpy.clip(). The way I'd try to solve that problem would be with
> > the dreaded for loop. Don't iterate over single elements, but if
> > you have a gargantuan array, working in chunks of ten thousand (or
> > whatever) won't have too much overhead:
> >
> > block = 100000
> > for n in arange(0,len(y),block):
> >     yc = y[n:n+block]
> >     yc[yc<0] = -1
> >
> > It's a bit of a pain, but working with arrays that nearly fill RAM
> > *is* a pain, as I'm sure you are all too aware by now.
> >
> > You might look into numexpr, this is the sort of thing it does
> > (though I've never used it and can't say whether it can do this).
>
> Well, Numexpr is designed to minimize the number of temporaries, and
> can do what Damian wants without requiring to put the mask in a
> temporary. However, the output will require new space.  The usage
> should be something like:
>
> In [11]: y = numpy.random.normal(0, 10, 10)
>
> In [12]: numexpr.evaluate('where(y<0, -1, y)')
> Out[12]:
> array([  7.11784295,  -1.        ,  10.92876842,  -1.        ,
>          0.76092629,  -1.        ,  14.07021792,  -1.        ,
>          5.67173405,  31.28631822])

Ops.  I realised that, for this particular case, Numexpr memory usage is 
similar to its NumPy counterpart:

y[:] = numpy.where(y<0, -1, y)

So, I think the best option for you should be working with chunks, as 
Anne suggested.

Cheers,

-- 
>0,0<   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 "-"


More information about the Numpy-discussion mailing list